Skip to content

Conversation

@pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Jun 25, 2025

What does the PR do?

  • Enabled usage support by default for non-streaming requests
  • Added support for streaming options and usage in streaming requests
  • Implemented test cases to validate the new functionality

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 30555247

Caveats:

Background

By default, the OpenAI API supports usage for non-streaming requests. For streaming requests, we need to enable stream_options: {"include_usage": true}.
https://platform.openai.com/docs/api-reference/chat/create
https://platform.openai.com/docs/api-reference/chat-streaming

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 changed the title Add support for usage in the OpenAI frontend vLLM backend feat: Add support for usage in the OpenAI frontend vLLM backend Jun 25, 2025
@pskiran1 pskiran1 added PR: feat A new feature openai OpenAI related labels Jun 25, 2025
@pskiran1 pskiran1 marked this pull request as ready for review June 25, 2025 11:03
@richardhuo-nv
Copy link
Contributor

richardhuo-nv commented Jun 26, 2025

NVIDIA/TensorRT-LLM#5445
Could you add a TODO and create an ticket to enable the tensorrt_llm backend's usage when tensorrt_llm version reached a new version that contain this commit? Probably the version 0.21 I think.

@richardhuo-nv
Copy link
Contributor

Does the change include reporting each streaming chunk's usage? If not, have we verified with the requestor that if per chunk usage is needed?

@pskiran1
Copy link
Member Author

pskiran1 commented Jul 2, 2025

Does the change include reporting each streaming chunk's usage? If not, have we verified with the requestor that if per chunk usage is needed?

For streaming, it reports the usage for the entire request in an additional chunk before the final response (choices will be empty), similar to the OpenAI API. More details are provided below. The user has also confirmed that this aligns with their requirements.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

image

@pskiran1 pskiran1 requested a review from richardhuo-nv July 2, 2025 13:52
Copy link
Contributor

@richardhuo-nv richardhuo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job! Thanks!

@pskiran1 pskiran1 merged commit d17512b into main Jul 3, 2025
3 checks passed
@pskiran1 pskiran1 deleted the spolisetty_openai_usage branch July 3, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openai OpenAI related PR: feat A new feature

Development

Successfully merging this pull request may close these issues.

3 participants