Unify LLM call-site configuration under llm.{default,profiles,callSites} by siddseethepalli · Pull Request #26159 · vellum-ai/vellum-assistant

siddseethepalli · 2026-04-17T02:47:19Z

Summary

Consolidates ~26 distinct LLM call sites in the assistant under a unified llm.{default, profiles, callSites} schema. Previously, configuration was scattered across config.json with three different override mechanisms (modelIntent strings, modelOverride strings, ad-hoc speed/model kwargs). Now every LLM call gets fully independent configuration — provider, model, maxTokens, thinking, effort, speed, temperature per call site, with a shared default baseline and reusable named profiles.

The macOS Settings UI gains:

An "N per-task overrides" badge on the Inference card
A grouped editable sheet for managing per-call-site overrides
An override-aware confirmation dialog when changing the global default provider

Self-review result

PASS after 2 remediation rounds (5 round-1 fix PRs + 1 round-2 fix PR).

PRs merged into feature branch

Plan PRs (in merge order)

Schema + resolver + migration (Waves 1-3):

config(llm): add unified llm schema with call-site enum and profile refines #26089: PR 1 — Add llm schema, enum, refines
config(llm): add resolveCallSiteConfig resolver with deep merge #26094: PR 2 — Add resolveCallSiteConfig resolver
config(llm): add llm field to AssistantConfigSchema (no behavior change) #26095: PR 3 — Wire LLMSchema into AssistantConfigSchema
workspace: migrate scattered LLM config keys into unified llm structure #26101: PR 4 — Workspace migration (scattered keys → llm.*)
providers: accept callSite in per-call config; resolve via resolveCallSiteConfig #26102: PR 5 — Provider layer accepts callSite

Agent loop + bespoke call-site adoption (Waves 4-5):

daemon: thread callSite through processMessage options and adapter callbacks #26115: PR 6 — Thread callSite through processMessage
memory: route extraction/consolidation/retrieval through call-site IDs #26106: PR 12 — Memory: extraction, consolidation, retrieval
memory: route narrative/pattern/summarization/starters through call-site IDs #26107: PR 13 — Memory: narrative, pattern, summarization, starters
workspace+conversation: route commit message and title through call-site IDs #26112: PR 14 — Workspace + conversation title
ui: route identity intro and empty-state greeting through call-site IDs #26108: PR 15 — UI greetings
notifications: route decision and preference extraction through call-site IDs #26109: PR 16 — Notifications
calls+watcher: route guardian copy and watch handlers through call-site IDs #26105: PR 17 — Voice + watch
utility: route classifier and analyzer LLM calls through call-site IDs #26111: PR 18 — Utility classifiers
heartbeat: pass callSite: 'heartbeatAgent' instead of speed kwarg #26125: PR 7 — heartbeatAgent call site
filing: pass callSite: 'filingAgent' instead of speed kwarg #26124: PR 8 — filingAgent call site
runtime/analyze-conversation: route through callSite: 'analyzeConversation' #26126: PR 9 — analyzeConversation call site
subagent: pass callSite: 'subagentSpawn' when spawning isolated agents #26122: PR 10 — subagentSpawn call site
calls: route the call agent loop through callSite: 'callAgent' #26123: PR 11 — callAgent call site

macOS UI (Waves 4-7):

macos(settings): migrate InferenceServiceCard reads/writes to llm.default.* #26113: PR 20 — InferenceServiceCard writes to llm.default.*
macos(settings): add SettingsStore APIs for per-call-site overrides #26128: PR 21 — SettingsStore per-call-site override APIs
macos(settings): show 'N call-site overrides' badge with read-only list sheet #26135: PR 22 — Override-count badge + read-only sheet
macos(settings): confirm default-provider switch when call-site overrides exist #26133: PR 24 — Default-switch confirmation dialog
macos(settings): make per-task override sheet editable with provider/model pickers #26136: PR 23 — Editable override sheet

Cleanup (Wave 8):

config(llm): remove deprecated scattered LLM keys #26140: PR 19 — Remove deprecated scattered keys

Round-1 self-review fix PRs

fix(config-loader): treat JSON null as key deletion in deepMergeOverwrite #26153: deepMergeOverwrite treats JSON null as recursive key deletion
fix(agent-loop): default user-initiated turns to callSite: 'mainAgent' #26154: Agent loop defaults user-initiated turns to callSite: 'mainAgent' (was deferred for a now-resolved race)
fix(meet-join): migrate consent-monitor + session-manager to callSite contract #26155: skills/meet-join/daemon callers migrated to the unified callSite contract (added meetConsentMonitor, meetChatOpportunity)
fix(macos): atomic provider+model save via single PATCH #26156: Atomic macOS provider+model save via single setLLMDefault(provider:model:) PATCH
fix(cleanup): remove dead code, refresh comments, add migration test, update docs #26157: Bundled cleanup — dead code, comment refresh, type consolidation, doc updates, migration 039 test, LLMSchema re-exports

Round-2 self-review fix PR

fix(r2): catalog test count, skill self-knowledge doc, AGENTS.md, loader docstring #26158: Catalog test count fix, vellum-self-knowledge skill doc refresh, AGENTS.md updates, deepMergeOverwrite doc tightening

Migration safety

Two workspace migrations land:

038-unify-llm-callsite-configs: backfills llm.{default, callSites, pricingOverrides} from scattered legacy keys (idempotent, deep-merges with any pre-existing llm.callSites/profiles)
039-drop-legacy-llm-keys: strips the now-deprecated keys from config.json after callers stop reading them (idempotent)

Both have test coverage in assistant/src/__tests__/workspace-migration-{038,039}-*.test.ts.

Part of plan: unify-llm-callsites.md

…efines (#26089) * config(llm): add unified llm schema with call-site enum and profile refines * fix(llm-schema): replace deepPartialObject helper with explicit .partial().extend() Zod 4's readonly shape typing tripped TS2542 in the LSP for the generic walker. Inline the one-level expansion for ContextWindowSchema and switch the superRefine issue code to the string literal (Zod 4 deprecated ZodIssueCode).

* config(llm): add resolveCallSiteConfig resolver with deep merge * fix(llm-resolver): deep-clone nested objects so resolved configs are isolated snapshots Codex flagged that the merge helper aliased nested objects from llm.default when no override touched them, so a caller mutating the returned config would silently corrupt the source. Recurse into plain-object sources unconditionally and add a regression test.

…ge) (#26095) * config(llm): add llm field to AssistantConfigSchema (no behavior change) * fix(llm-schema): add field-level defaults so partial llm configs don't trigger full config reset Codex flagged that requiring all LLMConfigBase fields meant the loader's leaf-deletion recovery couldn't repair partial/invalid llm blocks — falling through to cloneDefaultConfig() and discarding the user's other valid settings. Add .default(...) to every leaf so LLMSchema.parse({}) returns a fully-defaulted object, matching the pattern used by sibling config schemas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lSiteConfig (#26102)

…re (#26101) * workspace: migrate scattered LLM config keys into unified llm structure * fix(migration): preserve existing llm subtree; map notification intent to both call sites Codex flagged two issues: - The migration assignment replaced config.llm wholesale, destroying any pre-existing llm.callSites/profiles when llm.default was absent. Now merges into existing config.llm, preserving non-conflicting entries. - notifications.decisionModelIntent drives both notification classification and preference extraction, but the migration only seeded notificationDecision. Now seeds both call sites.

#26106)

…ite IDs (#26107)

…site IDs (#26109)

…te IDs (#26105)

#26111)

…ault.* (#26113)

…ite IDs (#26112)

…Ds (#26108)

…llbacks (#26115) * daemon: thread callSite through processMessage options and adapter callbacks * fix(callsite-threading): complete interface contract and server.ts symmetry Devin flagged two gaps in PR #26115: - ProcessConversationContext interface missing callSite in its runAgentLoop options type (works via structural typing but contract was incomplete; mocks would silently drop the field). - DaemonServer.persistAndProcessMessage didn't thread callSite to conversation.runAgentLoop, while DaemonServer.processMessage did. Aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(callsite): don't default unspecified callers to 'mainAgent' Codex flagged that defaulting to mainAgent for every turn routes them through the new RetryProvider call-site resolver, which reads from llm.default — but config-model.setModel still writes to services.inference without syncing llm.default. Result: stale/incompatible model IDs after a model switch. Defer the cutover. agent-loop turns now keep using the legacy modelIntent path (turnCallSite = options?.callSite, no fallback). PRs 7-11 still explicitly pass callSite and route through the new resolver as intended. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…6125)

…ation' (#26126)

#26122)

…26128) * macos(settings): add SettingsStore APIs for per-call-site overrides * fix(callsite-overrides): harden setCallSiteOverrides against dup-id crash and batch divergence Devin and Codex flagged two issues: - Dictionary(uniqueKeysWithValues:) crashes if callers pass duplicate CallSiteOverride.id values (external input — must be tolerant). Switch to Dictionary(_:uniquingKeysWith:) with last-write-wins. - Batch updates locally cleared entries omitted from the input but only PATCHed entries that were present, so omitted entries appeared cleared in the UI but reappeared on next sync. Now the PATCH payload includes NSNull clears for every catalog entry not in the batch, aligning remote with local. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(callsite-overrides): null entire entry on clear so non-UI leaves get cleared too Codex P2 (PR #26128 cycle 2): clearCallSiteOverride only nulled provider/model/profile, but call-site config supports additional leaves (maxTokens, effort, speed, thinking, contextWindow). If those were set via manual edits, the UI would report cleared while the daemon kept applying hidden overrides. Switch the PATCH payload from { provider: null, model: null, profile: null } to a single null on the entry itself. The Zod fragment treats null as absent, so the resolver falls back to llm.default. Same fix applies to the omitted-catalog-entry clears in setCallSiteOverrides batch. Tests updated to assert the new shape. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ides exist (#26133)

…st sheet (#26135) * macos(settings): show 'N call-site overrides' badge with read-only list sheet * fix(comments): drop PR-number breadcrumbs in callsite override files Devin flagged that comments referencing PR 22/23/24 violate clients/AGENTS.md 'Comment Quality' rule (no breadcrumbs). Replaced with timeless descriptions of code intent.

…model pickers (#26136) * macos(settings): make per-task override sheet editable with provider/model pickers * fix(callsite-sheet): preserve external updates and seed override from active default provider Codex flagged two P1s: - syncDraftsFromStore compared drafts against the NEW persisted value to decide 'touched', so external store updates were treated as user edits and got overwritten by Save All. Track the previously-persisted value in lastSyncedFromStore and consider a row touched only when the draft differs from that baseline. - Toggling 'Override default' on initialized provider from providerIds.first instead of the user's actual default provider, which could pin the wrong provider on save. Pass the user's default provider into CallSiteOverrideRow and seed from it. * fix(callsite-sheet): use entry-level null path for cleared rows in saveAll/resetAll Devin flagged that saveAll() and resetAll() were passing all-nil entries to setCallSiteOverrides, which routed them through the field-level null path (provider/model/profile = null). That left advanced leaves (maxTokens, effort, temperature, contextWindow) untouched on the daemon. Fix: - saveAll(): filter to entries with hasOverride == true; toggled-off rows fall through to the entry-level null path. - resetAll(): pass an empty list so every catalog entry hits the entry-level null path.

…rite (#26153)

#26154)

… contract (#26155)

… update docs (#26157)

…der docstring (#26158)

devin-ai-integration

Devin Review found 4 potential issues.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-17T02:51:39Z

+        if lastDaemonProvider == nil {
+            if let provider = llmDefault?["provider"] as? String {
+                self.selectedInferenceProvider = provider
+            }
+            if let model = llmDefault?["model"] as? String {
+                self.selectedModel = model
+            }
        }


📝 Info: macOS SettingsStore loadServiceModes reads from llm.default with services.inference fallback

The loadServiceModes change at SettingsStore.swift reads provider/model from llm.default.* first, falling back per-field to services.inference.*. The per-field fallback (rather than block-level) means a partial llm.default block (e.g., only provider set, no model) correctly takes provider from llm.default and model from services.inference. This covers: (1) fully-migrated configs where llm.default has everything, (2) unmigrated configs where only services.inference exists, and (3) edge cases where migration 038 ran but the config was partially written. The services.inference.mode field correctly stays under services since it governs managed-vs-your-own routing, orthogonal to model selection.

Was this helpful? React with 👍 or 👎 to provide feedback.

chatgpt-codex-connector

💡 Codex Review

vellum-assistant/assistant/src/agent/loop.ts

Lines 243 to 245 in 3bb54d3

    
           const providerConfig: Record<string, unknown> = { 
        
             max_tokens: turnMaxTokens, 
        
           };

Let call-site maxTokens override AgentLoop defaults

AgentLoop.run always sets config.max_tokens before sending the request. In RetryProvider.normalizeSendMessageOptions, call-site resolution only writes resolved.maxTokens when max_tokens is undefined, so llm.callSites.<id>.maxTokens can never take effect for AgentLoop turns (including mainAgent and other loop-based call sites). As a result, the new per-call-site token-cap setting is ignored in these paths.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T02:53:24Z

      const createPromise = (async () => {
        const config = getConfig();
-        let provider = getProvider(config.services.inference.provider);
+        let provider = getProvider(config.llm.default.provider);


Resolve conversation provider from the active call-site

getOrCreateConversation always constructs Conversation with config.llm.default.provider, even when the queued turn carries options.callSite (e.g. heartbeat, filing, analyze, or any per-task override). Because the provider transport is fixed at construction time, llm.callSites.<id>.provider overrides are never honored for AgentLoop-backed flows; the later callSite threading only adjusts request metadata, not which provider client is used. This makes provider overrides in the new per-call-site config/UI silently no-op whenever they differ from the default provider.

Useful? React with 👍 / 👎.

…restore SettingsStore fallback (#26252)

…ugh callSite (#26254)

devin-ai-integration

Devin Review found 2 new potential issues.

View 11 additional findings in Devin Review.

devin-ai-integration · 2026-04-17T23:16:35Z

+// Helpers — self-contained per workspace migrations AGENTS.md
+// ---------------------------------------------------------------------------
+
+const EFFORT_VALUES = new Set(["low", "medium", "high", "max"]);


🔴 Migration 038 EFFORT_VALUES set missing "xhigh", silently drops user's effort setting

The EFFORT_VALUES set in migration 038 is ["low", "medium", "high", "max"] but the runtime EffortEnum schema (assistant/src/config/schemas/llm.ts:73) includes "xhigh". When readEnum(config.effort, EFFORT_VALUES) encounters "xhigh", it returns undefined, and the ?? "max" fallback at assistant/src/workspace/migrations/038-unify-llm-callsite-configs.ts:84 replaces it. Since migration 039 then deletes the original top-level effort key, the user's "xhigh" setting is permanently lost and silently replaced with "max".

Suggested change

const EFFORT_VALUES = new Set(["low", "medium", "high", "max"]);

const EFFORT_VALUES = new Set(["low", "medium", "high", "xhigh", "max"]);

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-17T23:16:36Z

+    const parsed = LLMSchema.parse({});
+    expect(parsed.default).toEqual({
+      provider: "anthropic",
+      model: "claude-opus-4-6",


🚩 Migration 038 default model (claude-opus-4-6) vs schema default (claude-opus-4-7) discrepancy

The migration at assistant/src/workspace/migrations/038-unify-llm-callsite-configs.ts:82 defaults llm.default.model to "claude-opus-4-6" for configs with no explicit model, while LLMConfigBase at assistant/src/config/schemas/llm.ts:222 defaults to "claude-opus-4-7". This means existing installs that never set a model explicitly will be pinned to claude-opus-4-6 by the migration, while fresh installs default to claude-opus-4-7. The test at assistant/src/__tests__/llm-schema.test.ts:73 also expects "claude-opus-4-6" from LLMSchema.parse({}) which contradicts the actual schema default of "claude-opus-4-7" — that test will fail in CI. The migration discrepancy may be intentional (frozen snapshot), but the test vs schema mismatch is a clear error that CI should catch.

Was this helpful? React with 👍 or 👎 to provide feedback.

…view comments (#26258)

…anaged-picker)

… (#26270)

…ation gaps (#26271) * Fix Chrome extension allowlist ID and clarify README dev setup (#26259) Update the canonical allowlist to use the correct published CWS extension ID (hphbdmpffeigpcdjkckleobjmhhokpne). Restructure the Chrome extension README to clearly explain the allowlist merge strategy, separate the macOS app (automatic) path from the manual native messaging setup, and show how dev + prod extensions work side-by-side. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(clients): enable non-contiguous glyph layout for NSTextView-backed code views (#26242) TextKit 1 defaults NSLayoutManager.allowsNonContiguousLayout to false, which forces full-document glyph layout from character 0 on the main thread whenever a glyph range is queried. Attaching an NSTextView to its scroll view (setDocumentView: -> _setSuperview: -> setNeedsDisplayInRect: -> _glyphRangeForBoundingRect:) triggers that query during makeNSView, producing multi-second hangs on large code blocks. Opt into non-contiguous layout on every TextKit 1 stack we build via NSViewRepresentable so glyph generation is confined to the requested bounding rect. Also replace NSLayoutManager.ensureLayout(for:) in the code-view sizeThatFits paths with direct lineCount * fixedLineHeight math: the text container is unbounded horizontally (no wrapping) and paragraph style pins minimumLineHeight == maximumLineHeight, so the geometry is exact and avoids a second O(glyph count) main-thread path. Fixes VELLUM-ASSISTANT-MACOS-J2. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * fix(contacts): show Assistant badge for assistant-type contacts (LUM-1009) (#26239) * fix(contacts): show Assistant badge for assistant-type contacts (LUM-1009) * Move role/contactType derivation onto Kind for valid initializer --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(llm-callsite): UI override state divergence, null-as-delete, migration gaps - deepMergeOverwrite: null on scalar/null targets assigns null (preserves nullable config fields like activeHoursStart); null on object targets still deletes (call-site clearing). Fixes regression where PATCH with null for nullable fields was deleted then re-defaulted. - InferenceServiceCard: override confirmation dialog only fires when the resolved provider ID actually changes, not on mode-only toggles where both old and new resolve to the same provider. - CallSiteOverridesSheet: per-row Save uses replaceCallSiteOverride (clear-then-set) so stale daemon-side leaves are removed. The partial-update setCallSiteOverride would retain fields the draft nil'd. - CallSiteOverrideRow: merge consecutive .padding modifiers into single EdgeInsets call per macOS AGENTS.md layout rule. - SettingsStore: add replaceCallSiteOverride for full-entry replacement. --------- Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>

…rovider routing (#26275) * fix(meet-bot): address review feedback — Docker build, scraper races, audio capture, storage writer (#26264) * fix(meet): chat concurrency, dispose teardown, and wake adapter fidelity (#26265) * fix: heartbeat dual-emit, analysis dedup, test hermiticity, credential executor discovery (#26266) * fix: model default fallback, empty-response nudge scan (#26268) - Update FALLBACK_DEFAULT_MODEL to claude-opus-4-7 + test - Fix resolveModel to check Anthropic catalog (not just current default) so stale persisted defaults (e.g. claude-opus-4-6) don't get sent to non-Anthropic providers - Fix priorAssistantHadVisibleText backward scan to check ALL prior assistant messages, not just the most recent one Addresses review feedback from PRs #26247, #26164. * fix(meet): TTS stream races, barge-in tracking, ffmpeg error classification (#26267) * Fix extension-id-sync-guard test after canonical ID update (#26263) The guard test asserts that canonical extension IDs appear only in the allowlist config file. After updating the canonical ID to match the published CWS extension, it now collides with CWS URLs in README and browser-execution.ts. Fix by stripping CWS URLs before checking for bare ID occurrences, and ignore .codex-worktrees (repo copies). Also remove hardcoded CWS ID from README in favor of reading from the canonical config. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(llm-callsite): seed latency-optimized defaults, fix guardian provider routing, clean stale comments - Add LATENCY_OPTIMIZED_CALLSITE_DEFAULTS to schema for new installs - Create migration 040 to seed latency-optimized call-site entries for existing workspaces - Fix guardian-action-generators to use getConfiguredProvider() instead of bypassing call-site resolution - Restore commitMessage maxTokens: 120 and temperature: 0.2 via call-site defaults - Remove stale PR-reference comments from analyze-conversation.ts and voice-session-bridge.ts Addresses consolidated review feedback from PRs #26101-#26140. --------- Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

siddseethepalli · 2026-04-17T23:51:56Z

Follow-up PR #26275 merged — addresses consolidated review feedback from the callsite unification PRs:

Latency-optimized routing restored: Migration 038 only wrote callSite entries for explicit legacy overrides. Most users had none → latency-sensitive call sites fell through to llm.default (opus). Fixed with schema defaults (new installs) + migration 040 (existing workspaces).
Guardian provider routing fixed: guardian-action-generators.ts bypassed call-site provider resolution. Now uses getConfiguredProvider("guardianQuestionCopy").
Commit message token cap restored: maxTokens: 120 + temperature: 0.2 seeded via call-site defaults.
Stale PR-reference comments removed from analyze-conversation.ts and voice-session-bridge.ts.

devin-ai-integration

Devin Review found 3 new potential issues.

View 12 additional findings in Devin Review.

devin-ai-integration · 2026-04-18T00:00:03Z

    if (startNull !== endNull) {
-      // Emit on both fields so validateWithSchema's delete-and-retry strips
-      // both sides in one pass. Single-emit on the null side can cascade when
-      // the explicit value happens to equal the opposite default (e.g.
-      // { start: null, end: 8 } → strip start → default 8 → equal check fires
-      // → loader falls back to full defaults, wiping unrelated keys like
-      // maxTokens).
+      // Emit only on the null side so validateWithSchema's delete-and-retry
+      // preserves the explicit non-null value. Dual-emit would delete both
+      // keys, losing valid explicit values for mixed-null configs like
+      // { activeHoursStart: null, activeHoursEnd: 20 } → (8, 22) instead of
+      // retaining the explicit 20.
      const message =
        "heartbeat.activeHoursStart and heartbeat.activeHoursEnd must both be set or both be null";
      ctx.addIssue({
        code: z.ZodIssueCode.custom,
-        path: ["activeHoursStart"],
-        message,
-      });
-      ctx.addIssue({
-        code: z.ZodIssueCode.custom,
-        path: ["activeHoursEnd"],
+        path: [startNull ? "activeHoursStart" : "activeHoursEnd"],
        message,
      });
      return;


🔴 Heartbeat superRefine single-emit causes cascade to full config defaults when explicit value equals opposite default

The change from dual-emit to single-emit in the heartbeat superRefine re-introduces a config-wipe cascade for specific activeHours configurations. When a user has e.g. { heartbeat: { activeHoursStart: null, activeHoursEnd: 8 } }, the single-emit path emits an issue only on activeHoursStart (the null side). The loader's validateWithSchema (assistant/src/config/loader.ts:70-88) deletes that key and retries. On retry, activeHoursStart defaults to 8 (its schema default), creating start=8, end=8 — which triggers the equal-hours check, emitting a new issue. Since the loader performs only one delete-and-retry pass, this second failure cascades to cloneDefaultConfig(), wiping all unrelated user settings (e.g. llm.default.maxTokens). The old dual-emit approach deleted both sides in one pass, so the retry got (8, 22) — valid — and unrelated fields survived. The same cascade affects { activeHoursStart: 22, activeHoursEnd: null } and { activeHoursStart: N, activeHoursEnd: N } for values equal to the opposite side's default. The existing tests at assistant/src/__tests__/config-schema.test.ts:2316-2356 still expect maxTokens to be preserved, but the mechanical behavior under single-emit would wipe it.

Prompt for agents

The heartbeat superRefine was changed from dual-emit (emitting Zod issues on both activeHoursStart and activeHoursEnd) to single-emit (only on the null side). This introduces a cascade: when the non-null explicit value equals the opposite side's schema default, the loader's single delete-and-retry pass cannot recover — it deletes the null side, the default recreates an equal-hours mismatch, and the retry fails, causing cloneDefaultConfig() to wipe unrelated settings. The fix should restore dual-emit for the null-mismatch branch (and keep dual-emit for the equal-hours branch) so both sides are deleted in one pass and the retry succeeds with both values at their defaults. This is the behavior the tests at config-schema.test.ts:2316-2356 expect. Alternatively, if the single-emit behavior is desired (to preserve explicit values), the loader in config/loader.ts would need to be enhanced to support multiple rounds of delete-and-retry, but that is a larger change.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-18T00:00:05Z

+const LATENCY_OPTIMIZED_FRAGMENT = {
+  model: "claude-haiku-4-5-20251001",
+  effort: "low" as const,
+  thinking: { enabled: false },
+};
+
+export const LATENCY_OPTIMIZED_CALLSITE_DEFAULTS: Partial<
+  Record<LLMCallSite, z.input<typeof LLMCallSiteConfig>>
+> = {
+  guardianQuestionCopy: LATENCY_OPTIMIZED_FRAGMENT,
+  watchCommentary: LATENCY_OPTIMIZED_FRAGMENT,
+  interactionClassifier: LATENCY_OPTIMIZED_FRAGMENT,
+  skillCategoryInference: LATENCY_OPTIMIZED_FRAGMENT,
+  inviteInstructionGenerator: LATENCY_OPTIMIZED_FRAGMENT,
+  notificationDecision: LATENCY_OPTIMIZED_FRAGMENT,
+  preferenceExtraction: LATENCY_OPTIMIZED_FRAGMENT,
+  commitMessage: {
+    ...LATENCY_OPTIMIZED_FRAGMENT,
+    maxTokens: 120,
+    temperature: 0.2,
+  },
+};


🚩 LATENCY_OPTIMIZED_CALLSITE_DEFAULTS hardcodes Anthropic model IDs in schema defaults

The LATENCY_OPTIMIZED_CALLSITE_DEFAULTS at assistant/src/config/schemas/llm.ts:269-290 hardcodes model: 'claude-haiku-4-5-20251001' for all latency-optimized call sites. This model ID is Anthropic-specific. When llm.default.provider is set to a non-Anthropic provider (e.g., OpenAI), these schema-level defaults will seed call-site entries with an Anthropic model ID that doesn't exist on the target provider. The migration 040-seed-latency-callsite-defaults.ts correctly resolves provider-appropriate models via its PROVIDER_LATENCY_MODELS table, but the schema defaults are static and always Anthropic. Users on non-Anthropic providers who haven't run migration 040 (or who add new call sites after migration) would get Anthropic model IDs in their call-site defaults. The resolveModel function in assistant/src/providers/registry.ts:85-88 does catch Anthropic models on non-Anthropic providers and substitutes a fallback, which mitigates the issue at the provider initialization level.

Was this helpful? React with 👍 or 👎 to provide feedback.

…t body (#26280)

devin-ai-integration

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

devin-ai-integration · 2026-04-18T01:07:10Z

+        provider = new CallSiteRoutingProvider(provider, (name) => {
+          try {
+            return getProvider(name);
+          } catch {
+            return undefined;
+          }
+        });


🚩 Alternative providers from CallSiteRoutingProvider bypass RateLimitProvider

In assistant/src/daemon/server.ts:1048-1061, the provider stack for conversations is RateLimitProvider(CallSiteRoutingProvider(RetryProvider(defaultProvider))). When CallSiteRoutingProvider routes to an alternative provider via getProviderByName, that alternative comes directly from the registry (getProvider(name)) at line 1050 and is only wrapped in RetryProvider — NOT in RateLimitProvider. This means per-call-site provider overrides that route to a different transport bypass the conversation's rate limiter. Whether this is intentional (different providers have separate rate budgets) or a gap depends on the desired rate-limiting semantics. The same pattern appears in assistant/src/subagent/manager.ts:224-229.

Was this helpful? React with 👍 or 👎 to provide feedback.

…t session-manager error split)

PR #26159 (LLM callsite unification) accidentally reverted two earlier fixes when squash-merged from a stale base: 1. Re-applies the heartbeat dual-emit superRefine from #26278 so validateWithSchema's delete-and-retry strips both activeHours sides in one pass. Single-emit cascades when the explicit value equals the opposite default (e.g. { start: null, end: 8 }), causing the loader to fall back to full defaults and wipe unrelated fields like llm.default.maxTokens. 2. Changes LLMSchema.callSites to default to {} instead of LATENCY_OPTIMIZED_CALLSITE_DEFAULTS. Latency-optimized call-site defaults are owned by workspace migration 040 (which seeds them into the user's on-disk config), not the schema layer. Leaving the schema default populated polluted parsed configs with unrequested entries and broke the 'empty callSites by default' invariant that tests and downstream callers rely on. LATENCY_OPTIMIZED_FRAGMENT / LATENCY_OPTIMIZED_CALLSITE_DEFAULTS are now unused and removed.

…ult (#26286) PR #26159 (LLM callsite unification) accidentally reverted two earlier fixes when squash-merged from a stale base: 1. Re-applies the heartbeat dual-emit superRefine from #26278 so validateWithSchema's delete-and-retry strips both activeHours sides in one pass. Single-emit cascades when the explicit value equals the opposite default (e.g. { start: null, end: 8 }), causing the loader to fall back to full defaults and wipe unrelated fields like llm.default.maxTokens. 2. Changes LLMSchema.callSites to default to {} instead of LATENCY_OPTIMIZED_CALLSITE_DEFAULTS. Latency-optimized call-site defaults are owned by workspace migration 040 (which seeds them into the user's on-disk config), not the schema layer. Leaving the schema default populated polluted parsed configs with unrequested entries and broke the 'empty callSites by default' invariant that tests and downstream callers rely on. LATENCY_OPTIMIZED_FRAGMENT / LATENCY_OPTIMIZED_CALLSITE_DEFAULTS are now unused and removed.

… llm.default API The tests were added in #26251 against the old `setInferenceProvider` / `services.inference.provider` API. #26159 merged afterward renamed that API to `setLLMDefaultProvider` and moved the config path to `llm.default.provider`, leaving the tests unable to compile. Rename the calls and update the patch assertions to match the new shape.

… llm.default API (#26287) The tests were added in #26251 against the old `setInferenceProvider` / `services.inference.provider` API. #26159 merged afterward renamed that API to `setLLMDefaultProvider` and moved the config path to `llm.default.provider`, leaving the tests unable to compile. Rename the calls and update the patch assertions to match the new shape.

siddseethepalli and others added 30 commits April 16, 2026 15:47

providers: accept callSite in per-call config; resolve via resolveCal…

3a3d52f

…lSiteConfig (#26102)

memory: route extraction/consolidation/retrieval through call-site IDs (

5eb05dc

#26106)

memory: route narrative/pattern/summarization/starters through call-s…

051517f

…ite IDs (#26107)

notifications: route decision and preference extraction through call-…

42a1139

…site IDs (#26109)

calls+watcher: route guardian copy and watch handlers through call-si…

8ce9500

…te IDs (#26105)

utility: route classifier and analyzer LLM calls through call-site IDs (

fe3977f

#26111)

macos(settings): migrate InferenceServiceCard reads/writes to llm.def…

190ae0e

…ault.* (#26113)

workspace+conversation: route commit message and title through call-s…

8de79e1

…ite IDs (#26112)

ui: route identity intro and empty-state greeting through call-site I…

1d06d40

…Ds (#26108)

heartbeat: pass callSite: 'heartbeatAgent' instead of speed kwarg (#2…

b2ab37b

…6125)

filing: pass callSite: 'filingAgent' instead of speed kwarg (#26124)

6d401ca

runtime/analyze-conversation: route through callSite: 'analyzeConvers…

8dabf84

…ation' (#26126)

subagent: pass callSite: 'subagentSpawn' when spawning isolated agents (

49dfa61

#26122)

calls: route the call agent loop through callSite: 'callAgent' (#26123)

6d0b464

macos(settings): confirm default-provider switch when call-site overr…

4bd1381

…ides exist (#26133)

config(llm): remove deprecated scattered LLM keys (#26140)

8264d24

fix(config-loader): treat JSON null as key deletion in deepMergeOverw…

d987957

…rite (#26153)

fix(agent-loop): default user-initiated turns to callSite: 'mainAgent' (

5664ccb

#26154)

fix(meet-join): migrate consent-monitor + session-manager to callSite…

3750648

… contract (#26155)

fix(macos): atomic provider+model save via single PATCH (#26156)

b14dc46

fix(cleanup): remove dead code, refresh comments, add migration test,…

fa742d4

… update docs (#26157)

fix(r2): catalog test count, skill self-knowledge doc, AGENTS.md, loa…

3bb54d3

…der docstring (#26158)

siddseethepalli requested a review from awlevin as a code owner April 17, 2026 02:47

siddseethepalli self-assigned this Apr 17, 2026

devin-ai-integration Bot reviewed Apr 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

fix(llm-callsite): refresh stale docstring, restore overflow budget, …

2230643

…restore SettingsStore fallback (#26252)

siddseethepalli mentioned this pull request Apr 17, 2026

fix(llm-callsite): route provider transport and field precedence through callSite #26254

Merged

This comment was marked as resolved.

Sign in to view

fix(llm-callsite): route provider transport and field precedence thro…

b1193bf

…ugh callSite (#26254)

This comment was marked as resolved.

Sign in to view

Merge origin/main into feature branch (resolve 5 conflicts)

8afa7ef

devin-ai-integration Bot reviewed Apr 17, 2026

View reviewed changes

siddseethepalli and others added 5 commits April 17, 2026 19:31

fix(llm-callsite): pass CI + address subagent/thinking/temperature re…

784e7e2

…view comments (#26258)

Merge origin/main into feature branch (resolve InferenceServiceCard m…

c37693e

…anaged-picker)

test(extension-id-guard): allow CWS URL matches; mirrors main PR #26263…

abe76b2

… (#26270)

devin-ai-integration Bot reviewed Apr 18, 2026

View reviewed changes

fix(retry): stop forwarding contextWindow/provider to provider reques…

eaf405b

…t body (#26280)

devin-ai-integration Bot reviewed Apr 18, 2026

View reviewed changes

siddseethepalli added 2 commits April 18, 2026 02:13

Merge origin/main into feature branch (ext-id regex, README path, mee…

c4cac68

…t session-manager error split)

chore(skills): regenerate catalog.json

1057e97

siddseethepalli merged commit f8e0abd into main Apr 18, 2026
14 of 15 checks passed

siddseethepalli deleted the siddseethepalli/unify-llm-callsites branch April 18, 2026 02:15

siddseethepalli mentioned this pull request Apr 18, 2026

fix(config): restore heartbeat dual-emit and empty llm.callSites default #26286

Merged

siddseethepalli mentioned this pull request Apr 18, 2026

fix(macos): update inference selection tests to new llm.default API #26287

Merged

	const providerConfig: Record<string, unknown> = {
	max_tokens: turnMaxTokens,
	};

	const EFFORT_VALUES = new Set(["low", "medium", "high", "max"]);
	const EFFORT_VALUES = new Set(["low", "medium", "high", "xhigh", "max"]);

Conversation

siddseethepalli commented Apr 17, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Self-review result

PRs merged into feature branch

Plan PRs (in merge order)

Round-1 self-review fix PRs

Round-2 self-review fix PR

Migration safety

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

siddseethepalli commented Apr 17, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

siddseethepalli commented Apr 17, 2026 •

edited by devin-ai-integration Bot

Loading

devin-ai-integration Bot Apr 17, 2026 •

edited

Loading