Conversation
dffc0ea to
c636d4a
Compare
|
|
||
| // NewPromptTokenCounter takes a list of openai.ChatCompletionMessage and | ||
| // computes how many tokens are used by sending those messages to the model. | ||
| func NewPromptTokenCounter(prompt []openai.ChatCompletionMessage) (*PromptTokenCounter, error) { |
There was a problem hiding this comment.
Can this function return the count only? The PromptTokenCounter sounds a bit redundant.
There was a problem hiding this comment.
I merged the two static counter types, got rid of the struct, and answered part of the comment here: #29224 (comment)
|
|
||
| // NewSynchronousTokenCounter takes the completion request output and | ||
| // computes how many tokens were used by the model to generate this result. | ||
| func NewSynchronousTokenCounter(completion string) (*SynchronousTokenCounter, error) { |
There was a problem hiding this comment.
nit: The same as above. Do we need to return the token counter struct where we could only return the count?
There was a problem hiding this comment.
The point of abstracting the different type of token counting behind an interface was to hide whether token counting was immediate or asynchronous. From the caller pov, everything is a promise of a token count, even if we already computed it.
For prompt tokens we could return the count directly as they are always synchronously countable. For completion tokens that's not the case.
To simplify the prompt and synchronous counters I removed the struct and made them an integer type alias.
| go func() { | ||
| defer close(parts) | ||
| defer func() { | ||
| errCount := streamingTokenCounter.Finish() |
There was a problem hiding this comment.
I don't think I fully understand this pattern. You create the counter, you add all tokens and then you call TokenCount() in one thread and Finish() in the other? Why TokenCount() cannot just count all tokens and return the value?
There was a problem hiding this comment.
I revamped the logic to remove the wait logic and the Finished() function. I thought we were streaming completely asynchronously to the front end, but we are waiting for the stream to end in assist.go. Any TokenCount() invocation after this line will return the correct count.
justinas
left a comment
There was a problem hiding this comment.
Besides Jakub's notes, generally looks good.
| // parseJSONFromModel parses a JSON object from the model output and attempts to sanitize contaminant text | ||
| // to avoid triggering self-correction due to some natural language being bundled with the JSON. | ||
| // The output type is generic, and thus the structure of the expected JSON varies depending on T. | ||
| func parseJSONFromModel[T any](text string) (T, *invalidOutputError) { |
There was a problem hiding this comment.
Is this intentional? I think the best practice is to always return error and let the caller cast if necessary.
There was a problem hiding this comment.
I added this some time ago, this is intentional. Maybe that's best practice but if we have one error type and it's called from 1-2 places privately, I really think casting to error is detrimental to working effectively with the code since we just throw away information and forces ourselves to write boilerplate for handling the other case and remember to update without compiler errors when we have new special types etc.
There was a problem hiding this comment.
Here's a playground example demonstrating what happens without this change: https://go.dev/play/p/lD_J3gOIccf
By implicitly setting the err type to error a few lines before, I changed the behaviour of the error check and either had to change the function signature, or store the returned error in a new variable whose type is not error. I preferred the first solution as this is a footgun and it took me 20 minutes to understand why this was happening.
|
|
||
| // NewPromptTokenCounter takes a list of openai.ChatCompletionMessage and | ||
| // computes how many tokens are used by sending those messages to the model. | ||
| func NewPromptTokenCounter(prompt []openai.ChatCompletionMessage) (*PromptTokenCounter, error) { |
There was a problem hiding this comment.
do we need to split into two classes here and add a relatively large amount of boilerplate compared to the business logic here? is it sufficient to maintain one token counter class that keeps track of both? It's just a lot of newlines, godocs and various misc methods that don't really do anything.
There was a problem hiding this comment.
I merged prompt and synchronous token counters in the same type and eliminated the struct.
jakule
left a comment
There was a problem hiding this comment.
Looks good, Thanks for all the fixes
ea40b04 to
b283dc4
Compare
With the actor model, tokens can be used in multiple ways (picking tools, invoking them, ...), which don't necessarily end up in a final action (sometimes we return a nextStep instead). Streaming responses were another challenge: the agent returned without the completion being over (it returned a routine streaming the deltas sent by the model). This PR introduces a TokenCounter interface that abstracts synchronous and asynchronous token counting. All token-consuming operations must return a TokenCounter. TokensCounters are stored in the agent state and returned once the agent exists. Finally, the token counters are evaluated asynchronously to give the streaming completion requests enough time to finish.
b283dc4 to
d18563a
Compare
|
@hugoShaka See the table below for backport results.
|

Fixes https://github.com/gravitational/teleport.e/issues/1805
This PR refactors token counting by decorrelating token count and message responses. With the actor model, tokens can be used in multiple ways (picking tools, invoking them, ...), which don't necessarily end up in a final action (sometimes we return a nextStep instead). Streaming responses were another challenge: the agent returned without the completion being over (it returned a routine streaming the deltas sent by the model).
This PR introduces a TokenCounter interface that abstracts synchronous and asynchronous token counting. All token-consuming operations must return a TokenCounter. TokensCounters are stored in the agent state and returned once the agent exists. Finally, the token counters are evaluated asynchronously to give the streaming completion requests enough time to finish.