Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions assistant/src/__tests__/conversation-title-service.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ describe("conversation-title-service", () => {
provider,
systemPrompt: expect.stringContaining("conversation titles"),
tools: [],
modelIntent: "quality-optimized",
callSite: "conversationTitle",
timeoutMs: 10_000,
}),
);
Expand Down Expand Up @@ -203,7 +203,7 @@ describe("conversation-title-service", () => {
provider,
systemPrompt: expect.stringContaining("conversation titles"),
tools: [],
modelIntent: "quality-optimized",
callSite: "conversationTitle",
timeoutMs: 10_000,
}),
);
Expand Down
17 changes: 13 additions & 4 deletions assistant/src/__tests__/provider-commit-message-generator.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ describe("ProviderCommitMessageGenerator", () => {
});

// 6. LLM success
test('LLM success → returns LLM message, source "llm", fast model passed', async () => {
test('LLM success → returns LLM message, source "llm", fast model + callSite passed', async () => {
const commitMsg = "feat: add new feature";
mockSendMessage.mockResolvedValueOnce(makeSuccessResponse(commitMsg));
const gen = getCommitMessageGenerator();
Expand All @@ -232,10 +232,16 @@ describe("ProviderCommitMessageGenerator", () => {
expect(result.message).toBe(commitMsg);
expect(result.reason).toBeUndefined();

// Verify the fast model was passed in the config
// Verify the fast model and callSite were passed in the config so the
// provider's RetryProvider routes through `resolveCallSiteConfig` for
// max_tokens/temperature while preserving the explicit fast-model
// override.
const callArgs = mockSendMessage.mock.calls[0];
const options = callArgs[3] as { config: { model: string } };
const options = callArgs[3] as {
config: { model: string; callSite: string };
};
expect(options.config.model).toBe("claude-haiku-4-5-20251001");
expect(options.config.callSite).toBe("commitMessage");
Comment on lines +240 to +244

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Test mocks bypass RetryProvider resolution, hiding the max_tokens regression

The commit message generator test at assistant/src/__tests__/provider-commit-message-generator.test.ts:77-79 mocks resolveConfiguredProvider to return a bare mock provider (not a RetryProvider). This means callSite resolution via normalizeViaCallSite never runs in the test. The test at line 243 only verifies that callSite: "commitMessage" is present in the config object, not that the resolution produces correct max_tokens/temperature values. This is why the regression from the reported bug passes tests — the test doesn't exercise the production code path where RetryProvider resolves callSite into concrete config values.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

});

// 7. fast-model override
Expand All @@ -253,8 +259,11 @@ describe("ProviderCommitMessageGenerator", () => {
expect(result.message).toBe(commitMsg);

const callArgs = mockSendMessage.mock.calls[0];
const options = callArgs[3] as { config: { model: string } };
const options = callArgs[3] as {
config: { model: string; callSite: string };
};
expect(options.config.model).toBe("claude-sonnet-4-20250514");
expect(options.config.callSite).toBe("commitMessage");
});

// 8. LLM timeout
Expand Down
4 changes: 2 additions & 2 deletions assistant/src/memory/conversation-title-service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ export async function generateAndPersistConversationTitle(
provider,
systemPrompt: buildTitleSystemPrompt(),
tools: [],
modelIntent: "quality-optimized",
callSite: "conversationTitle",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Conversation title generation model and thinking behavior changes

The conversation title service previously used modelIntent: "quality-optimized" which resolved to provider-specific quality models (e.g., claude-opus-4-7 for Anthropic via assistant/src/providers/model-intents.ts:16). After this PR, it uses callSite: "conversationTitle" which, absent an explicit llm.callSites.conversationTitle entry, falls through to llm.default — resolving to claude-opus-4-6 (the LLMConfigBase default at assistant/src/config/schemas/llm.ts:224). This is a minor model change.

More notably, the resolver also injects effort: "max" and thinking: { enabled: true, streamThinking: true } from llm.default. For Anthropic (which is in THINKING_AWARE_PROVIDERS at assistant/src/providers/retry.ts:36), this means extended thinking is now enabled for title generation — a 2-6 word output. The max_tokens: 1024 set by btw-sidechain is preserved (the resolver's if (nextConfig.max_tokens === undefined) check at assistant/src/providers/retry.ts:194 doesn't overwrite it), so token output is bounded, but the thinking overhead adds latency and cost for a trivial generation task. This isn't a correctness bug but is a meaningful efficiency regression worth considering.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

signal,
timeoutMs: 10_000,
});
Expand Down Expand Up @@ -236,7 +236,7 @@ export async function regenerateConversationTitle(
provider,
systemPrompt: buildTitleSystemPrompt(),
tools: [],
modelIntent: "quality-optimized",
callSite: "conversationTitle",
signal,
timeoutMs: 10_000,
});
Expand Down
12 changes: 11 additions & 1 deletion assistant/src/runtime/btw-sidechain.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import type { LLMCallSite } from "../config/schemas/llm.js";
import { buildToolDefinitions } from "../daemon/conversation-tool-setup.js";
import { buildSystemPrompt } from "../prompts/system-prompt.js";
import {
Expand Down Expand Up @@ -29,6 +30,13 @@ export interface RunBtwSidechainParams {
systemPrompt?: string;
tools?: ToolDefinition[];
maxTokens?: number;
/**
* Opt-in routing through the unified LLM call-site resolver. When set, the
* provider resolves provider/model/maxTokens/etc. via
* `resolveCallSiteConfig(callSite, config.llm)` instead of `modelIntent`.
* `callSite` wins when both are passed.
*/
callSite?: LLMCallSite;
modelIntent?: ModelIntent;
signal?: AbortSignal;
timeoutMs?: number;
Expand Down Expand Up @@ -89,7 +97,9 @@ export async function runBtwSidechain(
config: {
max_tokens: params.maxTokens ?? 1024,
tool_choice: { type: "none" },
modelIntent: params.modelIntent ?? "latency-optimized",
...(params.callSite !== undefined
? { callSite: params.callSite }
Comment on lines 98 to +101

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let call-site max_tokens override sidechain defaults

runBtwSidechain now uses callSite, but it still always sends max_tokens: params.maxTokens ?? 1024. In RetryProvider.normalizeViaCallSite, resolved call-site maxTokens is only applied when max_tokens is undefined, so any llm.callSites.<id>.maxTokens setting is ignored (including for conversationTitle). This makes the new call-site path unable to honor configured token budgets.

Useful? React with 👍 / 👎.

: { modelIntent: params.modelIntent ?? "latency-optimized" }),
},
onEvent: (event) => {
if (event.type === "text_delta") {
Expand Down
14 changes: 12 additions & 2 deletions assistant/src/workspace/provider-commit-message-generator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,19 @@ export class ProviderCommitMessageGenerator {
{
signal: ac.signal,
config: {
// `callSite` lets the provider resolve `max_tokens` and
// `temperature` from `llm.callSites.commitMessage` (populated by
// the workspace migration from the legacy
// `workspaceGit.commitMessageLLM.{maxTokens,temperature}` keys).
// Operational fields (`enabled`, `timeoutMs`, `breaker`,
// `maxFilesInPrompt`, `maxDiffBytes`, `minRemainingTurnBudgetMs`)
// remain on `workspaceGit.commitMessageLLM` and are read above.
callSite: "commitMessage",
// `fastModel` overrides the resolver's `model` because commit
Comment on lines +273 to +274

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve commit-message token cap when call-site config is absent

This change removes the explicit workspaceGit.commitMessageLLM.maxTokens pass-through and relies on callSite: "commitMessage" to supply limits. Migration 038 only backfills llm.callSites.commitMessage.maxTokens when the legacy key exists in raw config.json, so workspaces that enabled this feature without explicitly setting maxTokens will now inherit llm.default.maxTokens (default 64000) instead of the previous 120-token cap, increasing latency/cost risk and weakening the intended short-output guardrail.

Useful? React with 👍 / 👎.

// message generation enforces its own provider-specific fast
// model selection (see `PROVIDER_DEFAULT_FAST_MODELS` and
// `providerFastModelOverrides`).
model: fastModel,
max_tokens: llmConfig.maxTokens,
temperature: llmConfig.temperature,
},
Comment on lines 265 to 279

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Commit message generation loses specialized max_tokens (120→64000) and temperature (0.2→null) for users with default settings

The PR removes the explicit max_tokens: llmConfig.maxTokens (default 120) and temperature: llmConfig.temperature (default 0.2) from the provider call config, relying instead on callSite: "commitMessage" resolution via resolveCallSiteConfig. However, the migration at assistant/src/workspace/migrations/038-unify-llm-callsite-configs.ts:172-185 only populates llm.callSites.commitMessage when maxTokens/temperature are explicitly present in the raw config.json file. Most users rely on Zod schema defaults (120 and 0.2 at assistant/src/config/schemas/workspace-git.ts:110-121), which are applied at parse time and aren't stored on disk. For these users, no llm.callSites.commitMessage entry exists, so resolveCallSiteConfig falls through to llm.default (assistant/src/config/schemas/llm.ts:222-231), which has maxTokens: 64000, temperature: null, effort: "max", and thinking: { enabled: true }. This causes a 533× increase in max_tokens, enables expensive extended thinking on Anthropic, and removes the low temperature setting — all for a call site that generates short commit messages (the system prompt says "Total output must be under 300 characters"). This violates the AGENTS.md backwards compatibility rule: "Never ship a change that silently breaks existing behavior."

Prompt for agents
The commit message generator at provider-commit-message-generator.ts:265-279 replaced explicit max_tokens and temperature with callSite resolution, but the migration (038-unify-llm-callsite-configs.ts) only copies these values when they are explicitly present in config.json. Users relying on Zod schema defaults (maxTokens=120, temperature=0.2) will silently get llm.default values (maxTokens=64000, temperature=null, effort=max, thinking=enabled).

Two possible fixes:

1. Keep the explicit max_tokens and temperature in the sendMessage config alongside callSite, so the normalizeViaCallSite function in retry.ts preserves them (it already has 'per-call explicit fields win' semantics for max_tokens and temperature):
   config: { callSite: 'commitMessage', model: fastModel, max_tokens: llmConfig.maxTokens, temperature: llmConfig.temperature }

2. Update the migration to always write the schema defaults for commitMessage when the commitMessageLLM block exists but the individual keys are absent. This would seed llm.callSites.commitMessage with maxTokens=120 and temperature=0.2 for all users who have the feature enabled.

Option 1 is safer and maintains the existing behavior exactly while still opting into callSite for model/effort/thinking resolution. Option 2 is more aligned with the full callSite migration vision but risks missing edge cases.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

},
);
Expand Down
Loading