Skip to content

Fix/cerebras conservative max tokens#5036

Closed
sebastiand-cerebras wants to merge 9 commits intoanomalyco:devfrom
sebastiand-cerebras:fix/cerebras-conservative-max-tokens
Closed

Fix/cerebras conservative max tokens#5036
sebastiand-cerebras wants to merge 9 commits intoanomalyco:devfrom
sebastiand-cerebras:fix/cerebras-conservative-max-tokens

Conversation

@sebastiand-cerebras
Copy link
Contributor

@sebastiand-cerebras sebastiand-cerebras commented Dec 4, 2025

This PR adds a specific configuration for the Cerebras provider to optimize rate limit handling and integration tracking.

Key changes:

  • Conservative Token Limit: Sets maxCompletionTokens to 16k. The Cerebras rate limiter estimates token consumption by reserving the full max_completion_tokens quota upfront. Using a conservative default prevents premature rate limiting, ensuring smoother operation even when actual generation is small.
  • Integration Header: Adds the X-Cerebras-3rd-Party-Integration: opencode` header.
  • Configuration: Sets autoload: false.

Testing:
Verified functionality with the following models: gpt-oss-120b, qwen-235, zai-glm4.6

@rekram1-node
Copy link
Collaborator

wouldn’t this kinda neuter a lot of models?

Can you explain why you need this models like gpt oss have 32k max completion output tokens and opencode should be respecting that…

What kinda plan are you on where you get ratelimited?

@sebastiand-cerebras
Copy link
Contributor Author

Cerebras handles rate limiting differently from most providers. It estimates token usage upfront using the  max_completion_tokens  value, so if a client always sends 32k, each request is counted as if it might produce 32k tokens, even when the actual completion is much smaller. On Cerebras Code plans this causes users to hit rate limits very quickly in agentic coding workflows that make many short calls, which is why a more conservative default like 8,192 tokens gives a much smoother experience without materially limiting typical code completions.

@github-actions
Copy link
Contributor

Closing this pull request because it has had no updates for more than 60 days. If you plan to continue working on it, feel free to reopen or open a new PR.

@github-actions github-actions bot closed this Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants