Fix/cerebras conservative max tokens#5036
Fix/cerebras conservative max tokens#5036sebastiand-cerebras wants to merge 9 commits intoanomalyco:devfrom
Conversation
|
wouldn’t this kinda neuter a lot of models? Can you explain why you need this models like gpt oss have 32k max completion output tokens and opencode should be respecting that… What kinda plan are you on where you get ratelimited? |
|
Cerebras handles rate limiting differently from most providers. It estimates token usage upfront using the max_completion_tokens value, so if a client always sends 32k, each request is counted as if it might produce 32k tokens, even when the actual completion is much smaller. On Cerebras Code plans this causes users to hit rate limits very quickly in agentic coding workflows that make many short calls, which is why a more conservative default like 8,192 tokens gives a much smoother experience without materially limiting typical code completions. |
extending to 16k
00637c0 to
71e0ba2
Compare
f1ae801 to
08fa7f7
Compare
|
Closing this pull request because it has had no updates for more than 60 days. If you plan to continue working on it, feel free to reopen or open a new PR. |
This PR adds a specific configuration for the Cerebras provider to optimize rate limit handling and integration tracking.
Key changes:
Testing:
Verified functionality with the following models: gpt-oss-120b, qwen-235, zai-glm4.6