Implement Token Usage Tracking #27

ZachZimm · 2024-08-23T04:56:13Z

This PR resolves issue #8 by adding token usage information to the response messages. The response conforms to the schema I found here for non-streaming and here for streaming, at the very bottom.
In this implementation, all streamed chunks include up-to-date information about token usage.

This streaming implementation differs slightly from openai's in that their api accepts a stream_options: dict parameter which can include "include_usage"=True, otherwise it will not stream back usage information. Additionally, they put forth a few rules about which chunks of the stream will contain usage data.

I did not add a count_tokens function as suggested in the issue, as it seems better to do this during generation since we are already using the tokenizer and in the case of non-streaming we already had an array of prompt tokens.

Blaizzy · 2024-08-24T05:51:26Z

Thank you very much for the great PR, @ZachZimm!

Blaizzy · 2024-08-24T05:58:51Z

This streaming implementation differs slightly from openai's in that their api accepts a stream_options: dict parameter which can include "include_usage"=True, otherwise it will not stream back usage information.

Additionally, they put forth a few rules about which chunks of the stream will contain usage data.

Can we implement this as well?

We need 1, to make it easier to define when we need it.

And 2 is a simple implementation, all you need to do is to pass the usage after streaming all tokens and just before yielding done (i.e., https://github.com/Blaizzy/fastmlx/blob/main/fastmlx/utils.py#L407).

fastmlx/utils.py

Blaizzy · 2024-08-24T06:02:50Z

Please run pre-commit to fix styling check errors.

ZachZimm · 2024-08-25T04:05:54Z

I pushed a new commit based on your feedback. I did run pre-commit this time (sorry, I'd never heard of it before) and there shouldn't be any extra comments or prints.

…ation and send

Blaizzy · 2024-09-10T12:34:33Z

@ZachZimm please update the tests to support token info, this is what causing them to fail.

ZachZimm added 2 commits August 22, 2024 21:32

Added token usage tracking in accordance with OpenAI API spec

617b559

Removed some unneeded, commented out code

8e16aeb

Blaizzy requested changes Aug 24, 2024

View reviewed changes

fastmlx/utils.py Outdated Show resolved Hide resolved

fastmlx/utils.py Outdated Show resolved Hide resolved

fastmlx/utils.py Outdated Show resolved Hide resolved

fastmlx/utils.py Show resolved Hide resolved

ZachZimm added 2 commits August 24, 2024 20:56

Added optional dict to with option.

c4df21f

Removed extraneous comment

49bfbd5

Fixed indentation error in lm_stream_generator during final chunk cre…

918ec65

…ation and send

updated tests

3358783

ZachZimm requested a review from Blaizzy September 24, 2024 01:09

Merge branch 'main' into main

c999cd2

Blaizzy merged commit cd199d8 into arcee-ai:main Oct 29, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Token Usage Tracking #27

Implement Token Usage Tracking #27

ZachZimm commented Aug 23, 2024

Blaizzy commented Aug 24, 2024

Blaizzy commented Aug 24, 2024 •

edited

Loading

Blaizzy commented Aug 24, 2024

ZachZimm commented Aug 25, 2024

Blaizzy commented Sep 10, 2024

Implement Token Usage Tracking #27

Implement Token Usage Tracking #27

Conversation

ZachZimm commented Aug 23, 2024

Blaizzy commented Aug 24, 2024

Blaizzy commented Aug 24, 2024 • edited Loading

Blaizzy commented Aug 24, 2024

ZachZimm commented Aug 25, 2024

Blaizzy commented Sep 10, 2024

Blaizzy commented Aug 24, 2024 •

edited

Loading