-
Notifications
You must be signed in to change notification settings - Fork 88
Refactor LLM callsites to use provider model intents #7943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| import { describe, expect, test } from 'bun:test'; | ||
| import type { Message, Provider, ProviderResponse, SendMessageOptions } from '../providers/types.js'; | ||
| import { RetryProvider } from '../providers/retry.js'; | ||
| import { getProviderDefaultModel, isModelIntent, resolveModelIntent } from '../providers/model-intents.js'; | ||
|
|
||
| const DUMMY_MESSAGES: Message[] = [ | ||
| { role: 'user', content: [{ type: 'text', text: 'hello' }] }, | ||
| ]; | ||
|
|
||
| function makeResponse(model: string): ProviderResponse { | ||
| return { | ||
| content: [{ type: 'text', text: 'ok' }], | ||
| model, | ||
| usage: { | ||
| inputTokens: 1, | ||
| outputTokens: 1, | ||
| }, | ||
| stopReason: 'end_turn', | ||
| }; | ||
| } | ||
|
|
||
| function makeProvider( | ||
| name: string, | ||
| onCall: (options: SendMessageOptions | undefined) => void, | ||
| ): Provider { | ||
| return { | ||
| name, | ||
| async sendMessage(_messages, _tools, _systemPrompt, options) { | ||
| onCall(options); | ||
| const config = options?.config as Record<string, unknown> | undefined; | ||
| return makeResponse((config?.model as string | undefined) ?? 'default-model'); | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| describe('model intents', () => { | ||
| test('validates model intent strings', () => { | ||
| expect(isModelIntent('latency-optimized')).toBe(true); | ||
| expect(isModelIntent('quality-optimized')).toBe(true); | ||
| expect(isModelIntent('vision-optimized')).toBe(true); | ||
| expect(isModelIntent('fastest-model')).toBe(false); | ||
| expect(isModelIntent(undefined)).toBe(false); | ||
| }); | ||
|
|
||
| test('resolves intent to provider-specific model', () => { | ||
| expect(resolveModelIntent('anthropic', 'latency-optimized')).toBe('claude-haiku-4-5-20251001'); | ||
| expect(resolveModelIntent('anthropic', 'quality-optimized')).toBe('claude-opus-4-6'); | ||
| expect(resolveModelIntent('anthropic', 'vision-optimized')).toBe('claude-sonnet-4-6'); | ||
| expect(resolveModelIntent('openai', 'latency-optimized')).toBe('gpt-4o-mini'); | ||
| }); | ||
|
|
||
| test('falls back to provider default for unknown providers', () => { | ||
| expect(getProviderDefaultModel('unknown-provider')).toBe('claude-opus-4-6'); | ||
| expect(resolveModelIntent('unknown-provider', 'quality-optimized')).toBe('claude-opus-4-6'); | ||
| }); | ||
| }); | ||
|
|
||
| describe('RetryProvider model intent normalization', () => { | ||
| test('translates modelIntent into concrete model and strips modelIntent key', async () => { | ||
| let seen: SendMessageOptions | undefined; | ||
| const wrapped = new RetryProvider(makeProvider('anthropic', (options) => { | ||
| seen = options; | ||
| })); | ||
|
|
||
| await wrapped.sendMessage(DUMMY_MESSAGES, undefined, undefined, { | ||
| config: { | ||
| modelIntent: 'quality-optimized', | ||
| max_tokens: 123, | ||
| }, | ||
| }); | ||
|
|
||
| const config = seen?.config as Record<string, unknown>; | ||
| expect(config.model).toBe('claude-opus-4-6'); | ||
| expect(config.modelIntent).toBeUndefined(); | ||
| expect(config.max_tokens).toBe(123); | ||
| }); | ||
|
|
||
| test('explicit model override wins over modelIntent', async () => { | ||
| let seen: SendMessageOptions | undefined; | ||
| const wrapped = new RetryProvider(makeProvider('openai', (options) => { | ||
| seen = options; | ||
| })); | ||
|
|
||
| await wrapped.sendMessage(DUMMY_MESSAGES, undefined, undefined, { | ||
| config: { | ||
| model: 'custom-model-v1', | ||
| modelIntent: 'latency-optimized', | ||
| }, | ||
| }); | ||
|
|
||
| const config = seen?.config as Record<string, unknown>; | ||
| expect(config.model).toBe('custom-model-v1'); | ||
| expect(config.modelIntent).toBeUndefined(); | ||
| }); | ||
| }); | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| import type { ModelIntent } from './types.js'; | ||
|
|
||
| const PROVIDER_DEFAULT_MODELS = { | ||
| anthropic: 'claude-opus-4-6', | ||
| openai: 'gpt-5.2', | ||
| gemini: 'gemini-3-flash', | ||
| ollama: 'llama3.2', | ||
| fireworks: 'accounts/fireworks/models/kimi-k2p5', | ||
| openrouter: 'x-ai/grok-4', | ||
| } as const; | ||
|
|
||
| type KnownProviderName = keyof typeof PROVIDER_DEFAULT_MODELS; | ||
|
|
||
| const PROVIDER_MODEL_INTENTS: Record<KnownProviderName, Record<ModelIntent, string>> = { | ||
| anthropic: { | ||
| 'latency-optimized': 'claude-haiku-4-5-20251001', | ||
| 'quality-optimized': 'claude-opus-4-6', | ||
| 'vision-optimized': 'claude-sonnet-4-6', | ||
| }, | ||
| openai: { | ||
| 'latency-optimized': 'gpt-4o-mini', | ||
| 'quality-optimized': 'gpt-5.2', | ||
| 'vision-optimized': 'gpt-4o', | ||
| }, | ||
| gemini: { | ||
| 'latency-optimized': 'gemini-3-flash', | ||
| 'quality-optimized': 'gemini-3-flash', | ||
| 'vision-optimized': 'gemini-3-flash', | ||
| }, | ||
| ollama: { | ||
| 'latency-optimized': 'llama3.2', | ||
| 'quality-optimized': 'llama3.2', | ||
| 'vision-optimized': 'llama3.2', | ||
| }, | ||
| fireworks: { | ||
| 'latency-optimized': 'accounts/fireworks/models/kimi-k2p5', | ||
| 'quality-optimized': 'accounts/fireworks/models/kimi-k2p5', | ||
| 'vision-optimized': 'accounts/fireworks/models/kimi-k2p5', | ||
| }, | ||
| openrouter: { | ||
| 'latency-optimized': 'x-ai/grok-4', | ||
| 'quality-optimized': 'x-ai/grok-4', | ||
| 'vision-optimized': 'x-ai/grok-4', | ||
| }, | ||
| }; | ||
|
|
||
| const MODEL_INTENTS = new Set<ModelIntent>([ | ||
| 'latency-optimized', | ||
| 'quality-optimized', | ||
| 'vision-optimized', | ||
| ]); | ||
|
|
||
| export function isModelIntent(value: unknown): value is ModelIntent { | ||
| return typeof value === 'string' && MODEL_INTENTS.has(value as ModelIntent); | ||
| } | ||
|
|
||
| export function getProviderDefaultModel(providerName: string): string { | ||
| const knownProvider = providerName as KnownProviderName; | ||
| return PROVIDER_DEFAULT_MODELS[knownProvider] ?? PROVIDER_DEFAULT_MODELS.anthropic; | ||
| } | ||
|
|
||
| export function resolveModelIntent(providerName: string, intent: ModelIntent): string { | ||
| const knownProvider = providerName as KnownProviderName; | ||
| const providerIntentModels = PROVIDER_MODEL_INTENTS[knownProvider]; | ||
| if (providerIntentModels) { | ||
| return providerIntentModels[intent]; | ||
| } | ||
| return getProviderDefaultModel(providerName); | ||
| } | ||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Summary generation silently upgraded from Sonnet to Opus due to intent taxonomy gap
The
generateSummaryfunction inwatch-handler.tspreviously usedmodel: 'claude-sonnet-4-6'(mid-tier). The migration changes it tomodelIntent: 'quality-optimized', which resolves toclaude-opus-4-6for the Anthropic provider (assistant/src/providers/model-intents.ts:17). This is a significant model upgrade — Opus is substantially more expensive and slower than Sonnet.Root cause: intent taxonomy lacks a mid-tier mapping
The intent taxonomy only offers three tiers:
latency-optimized→ Haiku (cheap/fast)quality-optimized→ Opus (expensive/best)vision-optimized→ Sonnet (mid-tier, but semantically tied to vision)The original Sonnet was a deliberate choice for summary generation — good enough quality at moderate cost. The migration had no semantically appropriate intent for a "balanced/general-purpose" tier, so
quality-optimizedwas chosen, but it maps to the most expensive model.For comparison, the commentary generation in the same file correctly maps Haiku →
latency-optimized→ Haiku (1:1 equivalent atwatch-handler.ts:129). But summary maps Sonnet →quality-optimized→ Opus, which is NOT a 1:1 equivalent.Impact: Every watch session summary now uses Opus instead of Sonnet, increasing API cost significantly (~5x per summary call) and potentially increasing latency for users.
Prompt for agents
Was this helpful? React with 👍 or 👎 to provide feedback.