assist: Refactor token counting by hugoShaka · Pull Request #29224 · gravitational/teleport

hugoShaka · 2023-07-17T21:17:07Z

Fixes https://github.com/gravitational/teleport.e/issues/1805

This PR refactors token counting by decorrelating token count and message responses. With the actor model, tokens can be used in multiple ways (picking tools, invoking them, ...), which don't necessarily end up in a final action (sometimes we return a nextStep instead). Streaming responses were another challenge: the agent returned without the completion being over (it returned a routine streaming the deltas sent by the model).

This PR introduces a TokenCounter interface that abstracts synchronous and asynchronous token counting. All token-consuming operations must return a TokenCounter. TokensCounters are stored in the agent state and returned once the agent exists. Finally, the token counters are evaluated asynchronously to give the streaming completion requests enough time to finish.

jakule · 2023-07-19T05:52:34Z

+
+// NewPromptTokenCounter takes a list of openai.ChatCompletionMessage and
+// computes how many tokens are used by sending those messages to the model.
+func NewPromptTokenCounter(prompt []openai.ChatCompletionMessage) (*PromptTokenCounter, error) {


Can this function return the count only? The PromptTokenCounter sounds a bit redundant.

I merged the two static counter types, got rid of the struct, and answered part of the comment here: #29224 (comment)

jakule · 2023-07-19T06:00:20Z

+
+// NewSynchronousTokenCounter takes the completion request output and
+// computes how many tokens were used by the model to generate this result.
+func NewSynchronousTokenCounter(completion string) (*SynchronousTokenCounter, error) {


nit: The same as above. Do we need to return the token counter struct where we could only return the count?

The point of abstracting the different type of token counting behind an interface was to hide whether token counting was immediate or asynchronous. From the caller pov, everything is a promise of a token count, even if we already computed it.

For prompt tokens we could return the count directly as they are always synchronously countable. For completion tokens that's not the case.

To simplify the prompt and synchronous counters I removed the struct and made them an integer type alias.

jakule · 2023-07-19T06:27:22Z

 			go func() {
 				defer close(parts)
+				defer func() {
+					errCount := streamingTokenCounter.Finish()


I don't think I fully understand this pattern. You create the counter, you add all tokens and then you call TokenCount() in one thread and Finish() in the other? Why TokenCount() cannot just count all tokens and return the value?

I revamped the logic to remove the wait logic and the Finished() function. I thought we were streaming completely asynchronously to the front end, but we are waiting for the stream to end in assist.go. Any TokenCount() invocation after this line will return the correct count.

justinas

Besides Jakub's notes, generally looks good.

justinas · 2023-07-19T11:39:16Z

 // parseJSONFromModel parses a JSON object from the model output and attempts to sanitize contaminant text
 // to avoid triggering self-correction due to some natural language being bundled with the JSON.
 // The output type is generic, and thus the structure of the expected JSON varies depending on T.
-func parseJSONFromModel[T any](text string) (T, *invalidOutputError) {


Is this intentional? I think the best practice is to always return error and let the caller cast if necessary.

I added this some time ago, this is intentional. Maybe that's best practice but if we have one error type and it's called from 1-2 places privately, I really think casting to error is detrimental to working effectively with the code since we just throw away information and forces ourselves to write boilerplate for handling the other case and remember to update without compiler errors when we have new special types etc.

I changed the function signature because this messes with the error handling later

Without the signature change, it considers that err != nil even if the error is nil with the type *model.invalidOutputError.

Here's a playground example demonstrating what happens without this change: https://go.dev/play/p/lD_J3gOIccf

By implicitly setting the err type to error a few lines before, I changed the behaviour of the error check and either had to change the function signature, or store the returned error in a new variable whose type is not error. I preferred the first solution as this is a footgun and it took me 20 minutes to understand why this was happening.

xacrimon · 2023-07-19T17:21:35Z

+
+// NewPromptTokenCounter takes a list of openai.ChatCompletionMessage and
+// computes how many tokens are used by sending those messages to the model.
+func NewPromptTokenCounter(prompt []openai.ChatCompletionMessage) (*PromptTokenCounter, error) {


do we need to split into two classes here and add a relatively large amount of boilerplate compared to the business logic here? is it sufficient to maintain one token counter class that keeps track of both? It's just a lot of newlines, godocs and various misc methods that don't really do anything.

I merged prompt and synchronous token counters in the same type and eliminated the struct.

jakule

Looks good, Thanks for all the fixes

With the actor model, tokens can be used in multiple ways (picking tools, invoking them, ...), which don't necessarily end up in a final action (sometimes we return a nextStep instead). Streaming responses were another challenge: the agent returned without the completion being over (it returned a routine streaming the deltas sent by the model). This PR introduces a TokenCounter interface that abstracts synchronous and asynchronous token counting. All token-consuming operations must return a TokenCounter. TokensCounters are stored in the agent state and returned once the agent exists. Finally, the token counters are evaluated asynchronously to give the streaming completion requests enough time to finish.

public-teleport-github-review-bot · 2023-07-21T22:03:26Z

@hugoShaka See the table below for backport results.

Branch	Result
branch/v13	Create PR

hugoShaka force-pushed the hugo/fix-e1805-assist-token-count branch from dffc0ea to c636d4a Compare July 18, 2023 14:09

hugoShaka marked this pull request as ready for review July 18, 2023 16:46

hugoShaka requested review from jakule, justinas and xacrimon July 18, 2023 16:46

github-actions Bot requested review from ibeckermayer and ryanclark July 18, 2023 16:46

github-actions Bot added the size/md label Jul 18, 2023

hugoShaka mentioned this pull request Jul 18, 2023

assist: add execution and discussion usage events #28492

Merged

jakule reviewed Jul 19, 2023

View reviewed changes

justinas reviewed Jul 19, 2023

View reviewed changes

xacrimon reviewed Jul 19, 2023

View reviewed changes

hugoShaka requested review from jakule, justinas and xacrimon July 19, 2023 18:54

xacrimon approved these changes Jul 19, 2023

View reviewed changes

jakule approved these changes Jul 20, 2023

View reviewed changes

Comment thread lib/ai/model/tokencount.go Outdated

Comment thread lib/ai/model/tokencount.go Outdated

Comment thread lib/ai/model/tokencount.go Outdated

Comment thread lib/ai/model/tokencount.go Outdated

Comment thread lib/web/assistant.go Outdated

public-teleport-github-review-bot Bot removed request for ibeckermayer and ryanclark July 20, 2023 05:12

jakule added the backport/branch/v13 label Jul 20, 2023

hugoShaka force-pushed the hugo/fix-e1805-assist-token-count branch from ea40b04 to b283dc4 Compare July 20, 2023 18:20

hugoShaka enabled auto-merge July 20, 2023 18:20

hugoShaka disabled auto-merge July 20, 2023 18:20

hugoShaka force-pushed the hugo/fix-e1805-assist-token-count branch from b283dc4 to d18563a Compare July 20, 2023 18:28

hugoShaka enabled auto-merge July 20, 2023 18:28

Merge branch 'master' into hugo/fix-e1805-assist-token-count

4861f81

hugoShaka added this pull request to the merge queue Jul 21, 2023

Merged via the queue into master with commit 2b15263 Jul 21, 2023

hugoShaka deleted the hugo/fix-e1805-assist-token-count branch July 21, 2023 22:02

jakule mentioned this pull request Jul 28, 2023

[v13] assist: Refactor token counting #29753

Merged

Conversation

hugoShaka commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugoShaka Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugoShaka Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakule left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

public-teleport-github-review-bot Bot commented Jul 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hugoShaka commented Jul 17, 2023 •

edited

Loading

hugoShaka Jul 19, 2023 •

edited

Loading

hugoShaka Jul 19, 2023 •

edited

Loading