feat: Add support for `usage` in the OpenAI frontend vLLM backend #8264

pskiran1 · 2025-06-25T09:08:23Z

What does the PR do?

Enabled usage support by default for non-streaming requests
Added support for streaming options and usage in streaming requests
Implemented test cases to validate the new functionality

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 30555247

Caveats:

Background

By default, the OpenAI API supports usage for non-streaming requests. For streaming requests, we need to enable stream_options: {"include_usage": true}.
https://platform.openai.com/docs/api-reference/chat/create
https://platform.openai.com/docs/api-reference/chat-streaming

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

into spolisetty_openai_usage

python/openai/openai_frontend/engine/triton_engine.py

python/openai/README.md

Co-authored-by: richardhuo-nv <[email protected]>

richardhuo-nv · 2025-06-26T18:49:53Z

NVIDIA/TensorRT-LLM#5445
Could you add a TODO and create an ticket to enable the tensorrt_llm backend's usage when tensorrt_llm version reached a new version that contain this commit? Probably the version 0.21 I think.

richardhuo-nv · 2025-06-26T18:55:55Z

Does the change include reporting each streaming chunk's usage? If not, have we verified with the requestor that if per chunk usage is needed?

pskiran1 · 2025-07-02T13:52:06Z

Does the change include reporting each streaming chunk's usage? If not, have we verified with the requestor that if per chunk usage is needed?

For streaming, it reports the usage for the entire request in an additional chunk before the final response (choices will be empty), similar to the OpenAI API. More details are provided below. The user has also confirmed that this aligns with their requirements.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

richardhuo-nv

Great Job! Thanks!

)

pskiran1 added 10 commits June 6, 2025 18:01

Update

c213bb5

Add "usage" support

4a340b6

Update

b396e45

Update

657578a

Update

25cb0a7

Update tests

cd669d9

Update

a31c3e2

Update

05338ee

Merge branch 'main' of https://github.com/triton-inference-server/server

be3dd0a

into spolisetty_openai_usage

Update tests

75412f2

pskiran1 changed the title ~~Add support for usage in the OpenAI frontend vLLM backend~~ feat: Add support for usage in the OpenAI frontend vLLM backend Jun 25, 2025

pskiran1 added PR: feat A new feature openai OpenAI related labels Jun 25, 2025

Update

498ad7a

pskiran1 marked this pull request as ready for review June 25, 2025 11:03

pskiran1 requested review from richardhuo-nv and rmccorm4 June 25, 2025 15:52

richardhuo-nv reviewed Jun 25, 2025

View reviewed changes

python/openai/openai_frontend/engine/triton_engine.py Outdated Show resolved Hide resolved

richardhuo-nv reviewed Jun 25, 2025

View reviewed changes

python/openai/README.md Outdated Show resolved Hide resolved

pskiran1 and others added 4 commits June 25, 2025 22:36

Update python/openai/README.md

07ca9c7

Co-authored-by: richardhuo-nv <[email protected]>

Update request validation

960290b

Update formatting

77354e0

Update formatting

3168c27

pskiran1 requested a review from richardhuo-nv July 2, 2025 13:52

richardhuo-nv approved these changes Jul 2, 2025

View reviewed changes

pskiran1 merged commit d17512b into main Jul 3, 2025
3 checks passed

pskiran1 deleted the spolisetty_openai_usage branch July 3, 2025 05:47

pskiran1 added a commit that referenced this pull request Jul 25, 2025

feat: Add support for usage in the OpenAI frontend vLLM backend (#8264

d093df4

)

pskiran1 mentioned this pull request Jul 25, 2025

feat: Add support for usage in the OpenAI frontend vLLM backend (#8… #8313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for `usage` in the OpenAI frontend vLLM backend #8264

feat: Add support for `usage` in the OpenAI frontend vLLM backend #8264

Uh oh!

pskiran1 commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

richardhuo-nv commented Jun 26, 2025 •

edited

Loading

Uh oh!

richardhuo-nv commented Jun 26, 2025

Uh oh!

pskiran1 commented Jul 2, 2025

Uh oh!

richardhuo-nv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

feat: Add support for usage in the OpenAI frontend vLLM backend #8264

feat: Add support for usage in the OpenAI frontend vLLM backend #8264

Uh oh!

Conversation

pskiran1 commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Uh oh!

Uh oh!

richardhuo-nv commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardhuo-nv commented Jun 26, 2025

Uh oh!

pskiran1 commented Jul 2, 2025

Uh oh!

richardhuo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

feat: Add support for `usage` in the OpenAI frontend vLLM backend #8264

feat: Add support for `usage` in the OpenAI frontend vLLM backend #8264

pskiran1 commented Jun 25, 2025 •

edited

Loading

richardhuo-nv commented Jun 26, 2025 •

edited

Loading