utility: route classifier and analyzer LLM calls through call-site IDs#26111
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15cb1758aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| config: { | ||
| modelIntent: "latency-optimized", | ||
| callSite: "interactionClassifier", |
There was a problem hiding this comment.
Route call-site requests through matching provider selection
This call now opts into config.callSite, but provider resolution in these paths still uses getConfiguredProvider()/resolveConfiguredProvider() without a call-site argument. That means transport selection still follows services.inference.provider, while RetryProvider.normalizeViaCallSite() rewrites the model from resolveCallSiteConfig(callSite, getConfig().llm). If those two configs diverge (which can happen after setModel, which only updates services.inference.*), the request can be sent to the wrong provider with an incompatible model ID (e.g. OpenAI transport + Claude model), causing hard failures and fallback behavior.
Useful? React with 👍 / 👎.
| undefined, | ||
| undefined, | ||
| { signal, config: { modelIntent: "latency-optimized" } }, | ||
| { signal, config: { callSite: "inviteInstructionGenerator" } }, |
There was a problem hiding this comment.
Keep invite generation on latency-optimized defaults
Replacing the explicit modelIntent: "latency-optimized" with callSite changes behavior when llm.callSites.inviteInstructionGenerator is not configured. LLMSchema defaults callSites to {}, and migration 038 does not seed this call site, so resolution falls back to llm.default (high-latency model by default). Because this path has a 5s timeout, the slower model will increase fallback-to-template frequency and regress instruction quality/latency for default installations.
Useful? React with 👍 / 👎.
| { | ||
| config: { | ||
| modelIntent: "latency-optimized", | ||
| callSite: "interactionClassifier", |
There was a problem hiding this comment.
🔴 callSite not passed to getConfiguredProvider() in classifier — per-call-site provider overrides ignored
The PR adds callSite: "interactionClassifier" to the config at line 68, but getConfiguredProvider() at assistant/src/daemon/classifier.ts:31 is called without the callSite argument. This means the provider is selected via the legacy services.inference.provider path, while the RetryProvider resolves model/params from the call-site config. If a user configures llm.callSites.interactionClassifier.provider = "openai", the classifier would still use the default provider (e.g. anthropic), and the resolved model (e.g. gpt-4o) would be sent to the wrong provider, causing an API error. The test at assistant/src/providers/__tests__/retry-callsite.test.ts:302-314 demonstrates the expected pattern: getConfiguredProvider("heartbeatAgent") with the call site passed as an argument.
Prompt for agents
In assistant/src/daemon/classifier.ts, the getConfiguredProvider() call at line 31 needs to receive the callSite argument so that per-call-site provider overrides are respected. Change line 31 from `const provider = await getConfiguredProvider()` to `const provider = await getConfiguredProvider("interactionClassifier")`. The same fix pattern applies to the other three files changed in this PR: style-analyzer.ts:130, skills.ts:1520, and invite-instruction-generator.ts:73 (which uses resolveConfiguredProvider instead).
Was this helpful? React with 👍 or 👎 to provide feedback.
| { signal: AbortSignal.timeout(30_000) }, | ||
| { | ||
| signal: AbortSignal.timeout(30_000), | ||
| config: { callSite: "styleAnalyzer" }, |
There was a problem hiding this comment.
🔴 callSite not passed to getConfiguredProvider() in style-analyzer — per-call-site provider overrides ignored
The PR adds callSite: "styleAnalyzer" at line 152, but getConfiguredProvider() at assistant/src/messaging/style-analyzer.ts:130 is called without the callSite argument. Same root cause as the classifier: the provider is selected via the legacy path while model/params are resolved via the call-site config, so llm.callSites.styleAnalyzer.provider overrides are silently ignored. If a user configures a different provider for this call site, the model resolved from the call-site config may be incompatible with the selected provider.
Prompt for agents
In assistant/src/messaging/style-analyzer.ts, the getConfiguredProvider() call at line 130 needs to receive the callSite argument: change `const provider = await getConfiguredProvider()` to `const provider = await getConfiguredProvider("styleAnalyzer")` so that per-call-site provider overrides are respected.
Was this helpful? React with 👍 or 👎 to provide feedback.
| undefined, | ||
| { | ||
| config: { modelIntent: "latency-optimized", max_tokens: 256 }, | ||
| config: { callSite: "skillCategoryInference", max_tokens: 256 }, |
There was a problem hiding this comment.
🔴 callSite not passed to getConfiguredProvider() in skills handler — per-call-site provider overrides ignored
The PR adds callSite: "skillCategoryInference" at line 1542, but getConfiguredProvider() at assistant/src/daemon/handlers/skills.ts:1520 is called without the callSite argument. Same root cause as the other sites: per-call-site provider overrides in llm.callSites.skillCategoryInference are silently ignored.
Prompt for agents
In assistant/src/daemon/handlers/skills.ts, the getConfiguredProvider() call at line 1520 needs to receive the callSite argument: change `const provider = await getConfiguredProvider()` to `const provider = await getConfiguredProvider("skillCategoryInference")` so that per-call-site provider overrides are respected.
Was this helpful? React with 👍 or 👎 to provide feedback.
| undefined, | ||
| undefined, | ||
| { signal, config: { modelIntent: "latency-optimized" } }, | ||
| { signal, config: { callSite: "inviteInstructionGenerator" } }, |
There was a problem hiding this comment.
🔴 callSite not passed to resolveConfiguredProvider() in invite-instruction-generator — per-call-site provider overrides ignored
The PR adds callSite: "inviteInstructionGenerator" at line 126, but resolveConfiguredProvider() at assistant/src/runtime/invite-instruction-generator.ts:73 is called without the callSite argument. Same root cause as the other sites: per-call-site provider overrides in llm.callSites.inviteInstructionGenerator are silently ignored.
Prompt for agents
In assistant/src/runtime/invite-instruction-generator.ts, the resolveConfiguredProvider() call at line 73 needs to receive the callSite argument: change `const resolved = await resolveConfiguredProvider()` to `const resolved = await resolveConfiguredProvider("inviteInstructionGenerator")` so that per-call-site provider overrides are respected.
Was this helpful? React with 👍 or 👎 to provide feedback.
fe3977f
into
siddseethepalli/unify-llm-callsites
…es} (#26159) * config(llm): add unified llm schema with call-site enum and profile refines (#26089) * config(llm): add unified llm schema with call-site enum and profile refines * fix(llm-schema): replace deepPartialObject helper with explicit .partial().extend() Zod 4's readonly shape typing tripped TS2542 in the LSP for the generic walker. Inline the one-level expansion for ContextWindowSchema and switch the superRefine issue code to the string literal (Zod 4 deprecated ZodIssueCode). * config(llm): add resolveCallSiteConfig resolver with deep merge (#26094) * config(llm): add resolveCallSiteConfig resolver with deep merge * fix(llm-resolver): deep-clone nested objects so resolved configs are isolated snapshots Codex flagged that the merge helper aliased nested objects from llm.default when no override touched them, so a caller mutating the returned config would silently corrupt the source. Recurse into plain-object sources unconditionally and add a regression test. * config(llm): add llm field to AssistantConfigSchema (no behavior change) (#26095) * config(llm): add llm field to AssistantConfigSchema (no behavior change) * fix(llm-schema): add field-level defaults so partial llm configs don't trigger full config reset Codex flagged that requiring all LLMConfigBase fields meant the loader's leaf-deletion recovery couldn't repair partial/invalid llm blocks — falling through to cloneDefaultConfig() and discarding the user's other valid settings. Add .default(...) to every leaf so LLMSchema.parse({}) returns a fully-defaulted object, matching the pattern used by sibling config schemas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * providers: accept callSite in per-call config; resolve via resolveCallSiteConfig (#26102) * workspace: migrate scattered LLM config keys into unified llm structure (#26101) * workspace: migrate scattered LLM config keys into unified llm structure * fix(migration): preserve existing llm subtree; map notification intent to both call sites Codex flagged two issues: - The migration assignment replaced config.llm wholesale, destroying any pre-existing llm.callSites/profiles when llm.default was absent. Now merges into existing config.llm, preserving non-conflicting entries. - notifications.decisionModelIntent drives both notification classification and preference extraction, but the migration only seeded notificationDecision. Now seeds both call sites. * memory: route extraction/consolidation/retrieval through call-site IDs (#26106) * memory: route narrative/pattern/summarization/starters through call-site IDs (#26107) * notifications: route decision and preference extraction through call-site IDs (#26109) * calls+watcher: route guardian copy and watch handlers through call-site IDs (#26105) * utility: route classifier and analyzer LLM calls through call-site IDs (#26111) * macos(settings): migrate InferenceServiceCard reads/writes to llm.default.* (#26113) * workspace+conversation: route commit message and title through call-site IDs (#26112) * ui: route identity intro and empty-state greeting through call-site IDs (#26108) * daemon: thread callSite through processMessage options and adapter callbacks (#26115) * daemon: thread callSite through processMessage options and adapter callbacks * fix(callsite-threading): complete interface contract and server.ts symmetry Devin flagged two gaps in PR #26115: - ProcessConversationContext interface missing callSite in its runAgentLoop options type (works via structural typing but contract was incomplete; mocks would silently drop the field). - DaemonServer.persistAndProcessMessage didn't thread callSite to conversation.runAgentLoop, while DaemonServer.processMessage did. Aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(callsite): don't default unspecified callers to 'mainAgent' Codex flagged that defaulting to mainAgent for every turn routes them through the new RetryProvider call-site resolver, which reads from llm.default — but config-model.setModel still writes to services.inference without syncing llm.default. Result: stale/incompatible model IDs after a model switch. Defer the cutover. agent-loop turns now keep using the legacy modelIntent path (turnCallSite = options?.callSite, no fallback). PRs 7-11 still explicitly pass callSite and route through the new resolver as intended. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * heartbeat: pass callSite: 'heartbeatAgent' instead of speed kwarg (#26125) * filing: pass callSite: 'filingAgent' instead of speed kwarg (#26124) * runtime/analyze-conversation: route through callSite: 'analyzeConversation' (#26126) * subagent: pass callSite: 'subagentSpawn' when spawning isolated agents (#26122) * calls: route the call agent loop through callSite: 'callAgent' (#26123) * macos(settings): add SettingsStore APIs for per-call-site overrides (#26128) * macos(settings): add SettingsStore APIs for per-call-site overrides * fix(callsite-overrides): harden setCallSiteOverrides against dup-id crash and batch divergence Devin and Codex flagged two issues: - Dictionary(uniqueKeysWithValues:) crashes if callers pass duplicate CallSiteOverride.id values (external input — must be tolerant). Switch to Dictionary(_:uniquingKeysWith:) with last-write-wins. - Batch updates locally cleared entries omitted from the input but only PATCHed entries that were present, so omitted entries appeared cleared in the UI but reappeared on next sync. Now the PATCH payload includes NSNull clears for every catalog entry not in the batch, aligning remote with local. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(callsite-overrides): null entire entry on clear so non-UI leaves get cleared too Codex P2 (PR #26128 cycle 2): clearCallSiteOverride only nulled provider/model/profile, but call-site config supports additional leaves (maxTokens, effort, speed, thinking, contextWindow). If those were set via manual edits, the UI would report cleared while the daemon kept applying hidden overrides. Switch the PATCH payload from { provider: null, model: null, profile: null } to a single null on the entry itself. The Zod fragment treats null as absent, so the resolver falls back to llm.default. Same fix applies to the omitted-catalog-entry clears in setCallSiteOverrides batch. Tests updated to assert the new shape. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * macos(settings): confirm default-provider switch when call-site overrides exist (#26133) * macos(settings): show 'N call-site overrides' badge with read-only list sheet (#26135) * macos(settings): show 'N call-site overrides' badge with read-only list sheet * fix(comments): drop PR-number breadcrumbs in callsite override files Devin flagged that comments referencing PR 22/23/24 violate clients/AGENTS.md 'Comment Quality' rule (no breadcrumbs). Replaced with timeless descriptions of code intent. * macos(settings): make per-task override sheet editable with provider/model pickers (#26136) * macos(settings): make per-task override sheet editable with provider/model pickers * fix(callsite-sheet): preserve external updates and seed override from active default provider Codex flagged two P1s: - syncDraftsFromStore compared drafts against the NEW persisted value to decide 'touched', so external store updates were treated as user edits and got overwritten by Save All. Track the previously-persisted value in lastSyncedFromStore and consider a row touched only when the draft differs from that baseline. - Toggling 'Override default' on initialized provider from providerIds.first instead of the user's actual default provider, which could pin the wrong provider on save. Pass the user's default provider into CallSiteOverrideRow and seed from it. * fix(callsite-sheet): use entry-level null path for cleared rows in saveAll/resetAll Devin flagged that saveAll() and resetAll() were passing all-nil entries to setCallSiteOverrides, which routed them through the field-level null path (provider/model/profile = null). That left advanced leaves (maxTokens, effort, temperature, contextWindow) untouched on the daemon. Fix: - saveAll(): filter to entries with hasOverride == true; toggled-off rows fall through to the entry-level null path. - resetAll(): pass an empty list so every catalog entry hits the entry-level null path. * config(llm): remove deprecated scattered LLM keys (#26140) * fix(config-loader): treat JSON null as key deletion in deepMergeOverwrite (#26153) * fix(agent-loop): default user-initiated turns to callSite: 'mainAgent' (#26154) * fix(meet-join): migrate consent-monitor + session-manager to callSite contract (#26155) * fix(macos): atomic provider+model save via single PATCH (#26156) * fix(cleanup): remove dead code, refresh comments, add migration test, update docs (#26157) * fix(r2): catalog test count, skill self-knowledge doc, AGENTS.md, loader docstring (#26158) * fix(llm-callsite): refresh stale docstring, restore overflow budget, restore SettingsStore fallback (#26252) * fix(llm-callsite): route provider transport and field precedence through callSite (#26254) * fix(llm-callsite): pass CI + address subagent/thinking/temperature review comments (#26258) * test(extension-id-guard): allow CWS URL matches; mirrors main PR #26263 (#26270) * fix(llm-callsite): UI override state divergence, null-as-delete, migration gaps (#26271) * Fix Chrome extension allowlist ID and clarify README dev setup (#26259) Update the canonical allowlist to use the correct published CWS extension ID (hphbdmpffeigpcdjkckleobjmhhokpne). Restructure the Chrome extension README to clearly explain the allowlist merge strategy, separate the macOS app (automatic) path from the manual native messaging setup, and show how dev + prod extensions work side-by-side. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(clients): enable non-contiguous glyph layout for NSTextView-backed code views (#26242) TextKit 1 defaults NSLayoutManager.allowsNonContiguousLayout to false, which forces full-document glyph layout from character 0 on the main thread whenever a glyph range is queried. Attaching an NSTextView to its scroll view (setDocumentView: -> _setSuperview: -> setNeedsDisplayInRect: -> _glyphRangeForBoundingRect:) triggers that query during makeNSView, producing multi-second hangs on large code blocks. Opt into non-contiguous layout on every TextKit 1 stack we build via NSViewRepresentable so glyph generation is confined to the requested bounding rect. Also replace NSLayoutManager.ensureLayout(for:) in the code-view sizeThatFits paths with direct lineCount * fixedLineHeight math: the text container is unbounded horizontally (no wrapping) and paragraph style pins minimumLineHeight == maximumLineHeight, so the geometry is exact and avoids a second O(glyph count) main-thread path. Fixes VELLUM-ASSISTANT-MACOS-J2. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * fix(contacts): show Assistant badge for assistant-type contacts (LUM-1009) (#26239) * fix(contacts): show Assistant badge for assistant-type contacts (LUM-1009) * Move role/contactType derivation onto Kind for valid initializer --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(llm-callsite): UI override state divergence, null-as-delete, migration gaps - deepMergeOverwrite: null on scalar/null targets assigns null (preserves nullable config fields like activeHoursStart); null on object targets still deletes (call-site clearing). Fixes regression where PATCH with null for nullable fields was deleted then re-defaulted. - InferenceServiceCard: override confirmation dialog only fires when the resolved provider ID actually changes, not on mode-only toggles where both old and new resolve to the same provider. - CallSiteOverridesSheet: per-row Save uses replaceCallSiteOverride (clear-then-set) so stale daemon-side leaves are removed. The partial-update setCallSiteOverride would retain fields the draft nil'd. - CallSiteOverrideRow: merge consecutive .padding modifiers into single EdgeInsets call per macOS AGENTS.md layout rule. - SettingsStore: add replaceCallSiteOverride for full-entry replacement. --------- Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai> * fix(llm-callsite): seed latency-optimized defaults and fix guardian provider routing (#26275) * fix(meet-bot): address review feedback — Docker build, scraper races, audio capture, storage writer (#26264) * fix(meet): chat concurrency, dispose teardown, and wake adapter fidelity (#26265) * fix: heartbeat dual-emit, analysis dedup, test hermiticity, credential executor discovery (#26266) * fix: model default fallback, empty-response nudge scan (#26268) - Update FALLBACK_DEFAULT_MODEL to claude-opus-4-7 + test - Fix resolveModel to check Anthropic catalog (not just current default) so stale persisted defaults (e.g. claude-opus-4-6) don't get sent to non-Anthropic providers - Fix priorAssistantHadVisibleText backward scan to check ALL prior assistant messages, not just the most recent one Addresses review feedback from PRs #26247, #26164. * fix(meet): TTS stream races, barge-in tracking, ffmpeg error classification (#26267) * Fix extension-id-sync-guard test after canonical ID update (#26263) The guard test asserts that canonical extension IDs appear only in the allowlist config file. After updating the canonical ID to match the published CWS extension, it now collides with CWS URLs in README and browser-execution.ts. Fix by stripping CWS URLs before checking for bare ID occurrences, and ignore .codex-worktrees (repo copies). Also remove hardcoded CWS ID from README in favor of reading from the canonical config. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(llm-callsite): seed latency-optimized defaults, fix guardian provider routing, clean stale comments - Add LATENCY_OPTIMIZED_CALLSITE_DEFAULTS to schema for new installs - Create migration 040 to seed latency-optimized call-site entries for existing workspaces - Fix guardian-action-generators to use getConfiguredProvider() instead of bypassing call-site resolution - Restore commitMessage maxTokens: 120 and temperature: 0.2 via call-site defaults - Remove stale PR-reference comments from analyze-conversation.ts and voice-session-bridge.ts Addresses consolidated review feedback from PRs #26101-#26140. --------- Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(retry): stop forwarding contextWindow/provider to provider request body (#26280) * chore(skills): regenerate catalog.json --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Noa Flaherty <noa@vellum.ai> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Summary
daemon/classifier.ts->callSite: 'interactionClassifier'.messaging/style-analyzer.ts->callSite: 'styleAnalyzer'(added; previous code had no per-call config).runtime/invite-instruction-generator.ts->callSite: 'inviteInstructionGenerator'.daemon/handlers/skills.ts->callSite: 'skillCategoryInference'(keeps per-callmax_tokens: 256override since it's a tight budget).Audit of remaining
modelIntent:literals (per plan PR 18 step 5)Each plan-flagged candidate was classified:
daemon/conversation.ts:341- deferred. This is a constructor parameter type signature (modelIntent?: ModelIntent), not a per-call config literal. The actual LLM call lives inside the AgentLoop; routing it through amainAgentcall site is a much larger refactor that needs to threadcallSitethrough the agent loop. Out of scope for utility classifiers.runtime/routes/conversation-routes.ts:2184- deferred. This generates a tab-complete autocomplete suggestion for the user's next reply. Doesn't fit any existing call-site ID (notidentityIntro, notemptyStateGreeting, notmainAgent). Needs a new call-site ID likereplySuggestion- flagged for follow-up.daemon/guardian-action-generators.ts:55- deferred. This is guardian action copy (timeout messages, etc.), distinct in purpose fromguardianQuestionCopy(which is question text). Folding both underguardianQuestionCopywould lose per-call-site tunability later. Needs a newguardianActionCopycall-site ID.daemon/guardian-action-generators.ts:149- deferred. Guardian follow-up disposition decision (tool-use call producing a structured disposition + reply). Different from action copy. Needs a newguardianFollowupDecisioncall-site ID.config/skills.ts:1166- deferred. This is the icon generation call (returns a 16x16 SVG), NOT category inference. TheskillCategoryInferenceID does not fit semantically. Needs a newskillIconGenerationcall-site ID.Other production
modelIntent:literals not in this PR's scope are owned by sibling plan PRs:conversation-title-service.ts,btw-sidechain.ts/btw-routes.ts,decision-engine.ts,guardian-question-copy.ts,watch-handler.ts.memory/graph/*,memory/job-handlers/*) andanalyze-conversation.tsare owned by earlier wiring PRs (7-13).runtime/routes/diagnostics-routes.tsis a diagnostics endpoint without a clear call-site ID match.Tests:
classifier.test.ts(6 pass),invite-routes-http.test.ts(25 pass), all skills-handler tests pass individually.bunx tsc --noEmitpasses.Part of plan: unify-llm-callsites.md (PR 18 of 24)