Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/tender-otters-pay.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"kilo-code": patch
---

fix: cap qwen3-max-thinking max_tokens to provider limit
38 changes: 38 additions & 0 deletions src/shared/__tests__/api.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,44 @@ describe("getModelMaxOutputTokens", () => {
})
})

// kilocode_change start
test("should cap qwen3-max-thinking to provider max output limit of 32,768", () => {
const model: ModelInfo = {
contextWindow: 300_000,
supportsPromptCache: false,
maxTokens: 200_000,
}

const result = getModelMaxOutputTokens({
modelId: "qwen/qwen3-max-thinking",
model,
settings: {},
format: "openrouter",
})

// 20% cap would be 60,000, but model-specific provider cap is 32,768.
expect(result).toBe(32_768)
})

test("should still honor lower context-based cap for qwen3-max-thinking", () => {
const model: ModelInfo = {
contextWindow: 100_000,
supportsPromptCache: false,
maxTokens: 200_000,
}

const result = getModelMaxOutputTokens({
modelId: "qwen/qwen3-max-thinking",
model,
settings: {},
format: "openrouter",
})

// 20% cap is 20,000 which is lower than 32,768.
expect(result).toBe(20_000)
})
Comment on lines +215 to +249
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new qwen3-max-thinking tests add Kilocode-specific behavior but aren’t marked with // kilocode_change comments. Please wrap these new test cases (or annotate the added lines) so downstream merges can distinguish fork changes from upstream Roo.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 13caa14: wrapped the two qwen3-max-thinking tests with // kilocode_change start/end markers in src/shared/tests/api.spec.ts.

// kilocode_change end

test("should handle GPT-5 models with various max token configurations", () => {
const testCases = [
{
Expand Down
18 changes: 17 additions & 1 deletion src/shared/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ export const shouldUseReasoningEffort = ({
export const DEFAULT_HYBRID_REASONING_MODEL_MAX_TOKENS = 16_384
export const DEFAULT_HYBRID_REASONING_MODEL_THINKING_TOKENS = 8_192
export const GEMINI_25_PRO_MIN_THINKING_TOKENS = 128
// kilocode_change start
const QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT = 32_768
// kilocode_change end

// Max Tokens

Expand Down Expand Up @@ -143,6 +146,10 @@ export const getModelMaxOutputTokens = ({
return ANTHROPIC_DEFAULT_MAX_TOKENS
}

// kilocode_change start
const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Mark shared src changes with kilocode_change comments

The new qwen3-max-thinking logic is added in src/ without kilocode_change markers, but the repository guideline in /workspace/kilocode/AGENTS.md requires all core-extension edits under src/ to be wrapped so upstream fork merges can isolate Kilo-specific patches; leaving this unmarked increases the chance of merge conflicts or accidental overwrite during the scripted Roo sync process.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 13caa14. Added explicit // kilocode_change annotations for the qwen3-max-thinking logic in src/shared/api.ts and corresponding tests.

// kilocode_change end

// If model has explicit maxTokens, clamp it to 20% of the context window
// Exception: GPT-5 models should use their exact configured max output tokens
if (model.maxTokens) {
Expand All @@ -154,8 +161,17 @@ export const getModelMaxOutputTokens = ({
return model.maxTokens
}

const contextCappedMaxTokens = Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2))

// kilocode_change start
// qwen3-max-thinking currently rejects values above 32,768 (upstream provider constraint).
if (isQwen3MaxThinkingModel) {
return Math.min(contextCappedMaxTokens, QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT)
}
Comment on lines 112 to +170
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Kilocode-specific logic/constants should be marked with // kilocode_change (or a start/end block) to keep future merges with upstream Roo manageable. Please annotate the newly added Qwen3 provider cap constant and the qwen3-max-thinking special-case branch accordingly.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 13caa14: added // kilocode_change markers around the new qwen3-specific constant and branch in src/shared/api.ts.

// kilocode_change end

// All other models are clamped to 20% of context window
return Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2))
return contextCappedMaxTokens
}

// For non-Anthropic formats without explicit maxTokens, return undefined
Expand Down
Loading