Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream_options support according to OpenAI API #1552

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tpfau
Copy link

@tpfau tpfau commented Jun 25, 2024

This PR adds the stream_options and include_usage fields as defined in the OpenAI API reference to the server.
Set to false by default they can be activated which leads to the last chunk being returned during streaming to supply the usage information.
This solves the Question raised in #1461 and the request indicated in #1498.
I decided to put the chunk generation code into its own function since not doing so would have lead to blown up code.
I did not update the places where stream is clearly set to false.

I'm happy to update the PR with any necessary/requested modifications wrt code style or if you think all chunk generation places should be changed to use functions (or add more options/update the chunk generation function). Just let me know.

Closes #1498

@tpfau tpfau changed the title This commit adds streaming options Add stream_options support according to OpenAI API Jun 25, 2024
@abetlen
Copy link
Owner

abetlen commented Jul 2, 2024

@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of stream_include_usage we just add stream_options directly to the methods in the Llama class as they should also mirror the OpenAI api.

@tpfau
Copy link
Author

tpfau commented Jul 4, 2024

@abetlen I hope this change is what you had in mind.
I'm admittedly unsure how I can properly add the description of the include_usage field description to the API (I tried to stick to how it was done with the completion requests specs for now).
I also wanted to avoid having to add the stream option check in every single function so I kept it in the "main" functionalities and left the addition of the stream_include_usage parameter in the conversion functions.

@tpfau
Copy link
Author

tpfau commented Aug 5, 2024

I would really like to see this in production, as I think this feature can be quite important for a lot of applications that need some kind of usage control or just let the users know about their usage. Of course people can work around this kind of approach but it requires additional code in the code base that does tokenization and thus computation which is completely unnecessary as it is already calculated by the model.

@tpfau
Copy link
Author

tpfau commented Aug 28, 2024

@abetlen Any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include usage key in create_completion when streaming
2 participants