-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking #23460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add _update_num_cached_tokens() method to track cached tokens from RequestOutput - Add _update_num_reasoning_tokens() method to track reasoning tokens based on: - Analysis channel content (parser.current_channel == 'analysis') - Tool directed messages - Integrate token tracking into append_output() methods for both context types - Cached tokens only tracked on first token in streaming mode Signed-off-by: George Nagy II <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds tracking for num_cached_tokens and num_reasoning_tokens to HarmonyContext and StreamingHarmonyContext. The changes look good and correctly implement the token counting logic. I have one suggestion to improve the readability of a complex condition in the new _update_num_reasoning_tokens method. By breaking down the condition into smaller, named variables, the code becomes easier to understand and maintain.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: George Nagy II <[email protected]>
heheda12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you very much.
|
@NagyGeorge Can you fix the pre-commit error? |
@heheda12345 yes I'm out of town right now so I should be able to within a couple days. |
Signed-off-by: Chen Zhang <[email protected]>
|
Thanks for letting me know. I've formatted it. |
…g_tokens tracking (vllm-project#23460) Signed-off-by: George Nagy II <[email protected]> Signed-off-by: Chen Zhang <[email protected]>
…g_tokens tracking (vllm-project#23460) Signed-off-by: George Nagy II <[email protected]> Signed-off-by: Chen Zhang <[email protected]>
Purpose
This implements tracking for num_cached_tokens and num_reasoning_tokens in the Response API's ResponseUsage object as requested in issue #23363.
Before/After:
Fixes #23363
Test Plan