-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Lifei/fixed accumulated token count #6587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue with accumulated token count in streaming responses from OpenAI-compatible providers. The changes extract and update usage information from streaming chunks, ensuring that usage data with output tokens is correctly captured and yielded only once.
Changes:
- Added
extract_usage_with_output_tokenshelper function to filter usage data that includes output tokens - Modified streaming response handler to accumulate usage from chunks during tool call processing
- Refactored tests with helper functions and added comprehensive test coverage for multiple providers (OpenRouter, OpenAI GPT-5, Tetrate Claude)
| let tool_chunk: StreamingChunk = serde_json::from_str(line) | ||
| .map_err(|e| anyhow!("Failed to parse streaming chunk: {}: {:?}", e, &line))?; | ||
|
|
||
| if let Some(chunk_usage) = extract_usage_witqh_output_tokens(&tool_chunk) { |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in function name: "extract_usage_witqh_output_tokens" should be "extract_usage_with_output_tokens".
| if let Some(chunk_usage) = extract_usage_witqh_output_tokens(&tool_chunk) { | |
| if let Some(chunk_usage) = extract_usage_with_output_tokens(&tool_chunk) { |
| #[tokio::test] | ||
| async fn test_streamed_multi_tool_response_to_messages() -> anyhow::Result<()> { | ||
| let response_lines = r#" | ||
| let responqse_lines = r#" |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in variable name: "responqse_lines" should be "response_lines".
| let responqse_lines = r#" | |
| let response_lines = r#" |
| assert_eq!(usage.usage.total_tokens, Some(expected_total)); | ||
| } | ||
|
|
||
| #[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the details since this just feels like a large dump, but not sure we need three tests here to verify that we are now doing the right thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug is caused by different structures of the streaming chunks in api response. So I added the tests with the test data with variations for different providers. This will provide harness when we changing the logic in the future.
jamadeo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the same across all providers @lifeizhou-ap ? You mention databricks/claude but since this code is in formats/openai.rs I'd be worried if we're missing some specific logic per-provider for token tracking
Yes, I was worrying about this too including the last PR. I manually tested with different providers again. Also I added the chunk payload in new tests to make sure future change won't break this. I am not sure whether I understand your question correctly, happy to have a chat! |
…ovider * 'main' of github.com:block/goose: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) recipes: add mcp server (#6552) feat(gcp-vertex): add model list with org policy filtering (#6393) chore: encourage extension searching (#6582) blog: mobile apps consolidation and roadmap (#6580) chore: remove unused dependencies in cargo.toml (#6561) resolved all the extensions to load in cli (#6464)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492)
* main: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492)
* main: (41 commits) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) recipes: add mcp server (#6552) feat(gcp-vertex): add model list with org policy filtering (#6393) chore: encourage extension searching (#6582) blog: mobile apps consolidation and roadmap (#6580) chore: remove unused dependencies in cargo.toml (#6561) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
* main: (68 commits) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584) increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531) Fix path for global agent skills (#6591) ...
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Summary
Fix wrong accumulated token count for databricks claude models.
Why
For Databricks Claude models, the usage is in all the chunks with same
prompt_token. But only the last one with finish_reason hascompletion_token.With the change in this PR #6493, for each api,
Actual: we adds
prompt_tokenin all the chunksExpected: we should only add once, which can use the chunk with finish_reason.
What
completion_token. Ifcompletion_tokenis null, this means we don't have the usage yet.Type of Change
AI Assistance
Testing
Unit tests and manual tests