Skip to content

Conversation

@NagyGeorge
Copy link
Contributor

@NagyGeorge NagyGeorge commented Aug 23, 2025

Purpose

This implements tracking for num_cached_tokens and num_reasoning_tokens in the Response API's ResponseUsage object as requested in issue #23363.

Before/After:

  • Before: num_cached_tokens and num_reasoning_tokens were always 0
  • After: These fields accurately reflect the actual cached and reasoning token usage

Fixes #23363

Test Plan

  • Pre-commit Validation: All pre-commit hooks pass
  • Existing Test Coverage: Rely on existing CI pipeline tests for HarmonyContext and StreamingHarmonyContext to validate no regressions

  - Add _update_num_cached_tokens() method to track cached tokens from RequestOutput
  - Add _update_num_reasoning_tokens() method to track reasoning tokens based on:
  - Analysis channel content (parser.current_channel == 'analysis')
  - Tool directed messages
  - Integrate token tracking into append_output() methods for both context types
  - Cached tokens only tracked on first token in streaming mode

Signed-off-by: George Nagy II <[email protected]>
@NagyGeorge NagyGeorge requested a review from aarnphm as a code owner August 23, 2025 03:41
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the frontend label Aug 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds tracking for num_cached_tokens and num_reasoning_tokens to HarmonyContext and StreamingHarmonyContext. The changes look good and correctly implement the token counting logic. I have one suggestion to improve the readability of a complex condition in the new _update_num_reasoning_tokens method. By breaking down the condition into smaller, named variables, the code becomes easier to understand and maintain.

NagyGeorge and others added 2 commits August 23, 2025 03:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: George Nagy II <[email protected]>
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you very much.

@heheda12345
Copy link
Collaborator

@NagyGeorge Can you fix the pre-commit error?

@heheda12345 heheda12345 changed the title [Feature] Add support for num_cached_tokens and num_reasoning_tokens tracking [Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking Sep 3, 2025
@heheda12345 heheda12345 added the gpt-oss Related to GPT-OSS models label Sep 3, 2025
@NagyGeorge
Copy link
Contributor Author

@NagyGeorge Can you fix the pre-commit error?

@heheda12345 yes I'm out of town right now so I should be able to within a couple days.

@heheda12345
Copy link
Collaborator

Thanks for letting me know. I've formatted it.

@heheda12345 heheda12345 enabled auto-merge (squash) September 3, 2025 18:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025
@heheda12345 heheda12345 merged commit 36c260d into vllm-project:main Sep 3, 2025
45 of 47 checks passed
@NagyGeorge NagyGeorge deleted the feature/work-branch branch September 4, 2025 18:27
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…g_tokens tracking (vllm-project#23460)

Signed-off-by: George Nagy II <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…g_tokens tracking (vllm-project#23460)

Signed-off-by: George Nagy II <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Response API] Support num_cached_tokens and num_reasoning_tokens in ResponseUsage

2 participants