-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: cap qwen3-max-thinking max_tokens to provider limit #5885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
913a39a
13caa14
b1d201b
c7d5865
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| "kilo-code": patch | ||
| --- | ||
|
|
||
| fix: cap qwen3-max-thinking max_tokens to provider limit |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -108,6 +108,9 @@ export const shouldUseReasoningEffort = ({ | |
| export const DEFAULT_HYBRID_REASONING_MODEL_MAX_TOKENS = 16_384 | ||
| export const DEFAULT_HYBRID_REASONING_MODEL_THINKING_TOKENS = 8_192 | ||
| export const GEMINI_25_PRO_MIN_THINKING_TOKENS = 128 | ||
| // kilocode_change start | ||
| const QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT = 32_768 | ||
| // kilocode_change end | ||
|
|
||
| // Max Tokens | ||
|
|
||
|
|
@@ -143,6 +146,10 @@ export const getModelMaxOutputTokens = ({ | |
| return ANTHROPIC_DEFAULT_MAX_TOKENS | ||
| } | ||
|
|
||
| // kilocode_change start | ||
| const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking") | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The new Useful? React with 👍 / 👎.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in 13caa14. Added explicit // kilocode_change annotations for the qwen3-max-thinking logic in src/shared/api.ts and corresponding tests. |
||
| // kilocode_change end | ||
|
|
||
| // If model has explicit maxTokens, clamp it to 20% of the context window | ||
| // Exception: GPT-5 models should use their exact configured max output tokens | ||
| if (model.maxTokens) { | ||
|
|
@@ -154,8 +161,17 @@ export const getModelMaxOutputTokens = ({ | |
| return model.maxTokens | ||
| } | ||
|
|
||
| const contextCappedMaxTokens = Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2)) | ||
|
|
||
| // kilocode_change start | ||
| // qwen3-max-thinking currently rejects values above 32,768 (upstream provider constraint). | ||
| if (isQwen3MaxThinkingModel) { | ||
| return Math.min(contextCappedMaxTokens, QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT) | ||
| } | ||
|
Comment on lines
112
to
+170
|
||
| // kilocode_change end | ||
|
|
||
| // All other models are clamped to 20% of context window | ||
| return Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2)) | ||
| return contextCappedMaxTokens | ||
| } | ||
|
|
||
| // For non-Anthropic formats without explicit maxTokens, return undefined | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new qwen3-max-thinking tests add Kilocode-specific behavior but aren’t marked with
// kilocode_changecomments. Please wrap these new test cases (or annotate the added lines) so downstream merges can distinguish fork changes from upstream Roo.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 13caa14: wrapped the two qwen3-max-thinking tests with // kilocode_change start/end markers in src/shared/tests/api.spec.ts.