Skip to content

Add rate limiting to Assist#26011

Merged
justinas merged 10 commits intomasterfrom
justinas/assist-rate-limiter
May 15, 2023
Merged

Add rate limiting to Assist#26011
justinas merged 10 commits intomasterfrom
justinas/assist-rate-limiter

Conversation

@justinas
Copy link
Copy Markdown
Contributor

@justinas justinas commented May 10, 2023

Rate limit Assist in Cloud based on the amount of tokens used (cluster-wide). This is a naive implementation that only stores the counter in memory.

Comment thread lib/web/apiserver.go Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread lib/web/apiserver.go Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this instead of oxy ratelimiter, as the latter is quite tightly coupled with individual http.Request-s.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this PR command as a code command too? The reasoning won't be easily visible to people after this PR is merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment thread lib/web/assistant.go Outdated
@justinas justinas marked this pull request as ready for review May 10, 2023 17:36
@justinas justinas requested review from hugoShaka and jakule May 10, 2023 17:36
@github-actions github-actions Bot requested review from codingllama and tigrato May 10, 2023 17:39
@justinas justinas changed the base branch from dev-ai to master May 11, 2023 13:58
@justinas justinas changed the base branch from master to dev-ai May 11, 2023 14:00
@justinas justinas force-pushed the justinas/assist-rate-limiter branch from f6788a8 to ed8a4fd Compare May 12, 2023 14:20
@justinas justinas changed the base branch from dev-ai to jakule/ai-web-endpoints May 12, 2023 14:20
Copy link
Copy Markdown
Contributor

@jakule jakule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinas Can we add some tests to it?

Comment thread lib/web/apiserver.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this PR command as a code command too? The reasoning won't be easily visible to people after this PR is merged.

Comment thread lib/web/apiserver.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to trace it? If yes, can you link it here?

Comment thread lib/web/assistant.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be an assistant message. We should create a new message type, so the UI can react to it instead of displaying an error as a regular message. We should also close the WS connection as a client should not write anything more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a new message type, however I kept the frontend behavior the same for now. WS connection now closes

Comment thread lib/web/assistant.go Outdated
Base automatically changed from jakule/ai-web-endpoints to master May 13, 2023 05:19
@justinas justinas force-pushed the justinas/assist-rate-limiter branch from ed8a4fd to b598ee0 Compare May 15, 2023 12:16
@justinas
Copy link
Copy Markdown
Contributor Author

@justinas Can we add some tests to it?

Added a test case where rate limit is hit.

@justinas justinas requested a review from jakule May 15, 2023 12:32
@justinas justinas force-pushed the justinas/assist-rate-limiter branch from 0742181 to c52f349 Compare May 15, 2023 12:36
@justinas justinas requested a review from ryanclark May 15, 2023 12:38
Copy link
Copy Markdown
Contributor

@jakule jakule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the backend changes, but please wait for @ryanclark to approve the UI changes.

): Promise<MessagesAction> {
if (message.type === 'CHAT_MESSAGE_ASSISTANT') {
if (
message.type === 'CHAT_MESSAGE_ASSISTANT' ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanclark Can you take a look at it?

Comment thread lib/web/assistant.go
TotalTokens: int64(usedTokens.Prompt + usedTokens.Competition),
TotalTokens: int64(usedTokens.Prompt + usedTokens.Completion),
PromptTokens: int64(usedTokens.Prompt),
CompletionTokens: int64(usedTokens.Competition),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, sorry for that 😅

Comment thread lib/web/assistant.go Outdated
// Try to consume a small amount of tokens first.
const lookaheadTokens = 100
if !h.assistantLimiter.AllowN(time.Now(), lookaheadTokens) {
err := onMessageFn(assist.MessageKindUIMessage, []byte("You have reached the rate limit. Please try again later."), h.clock.Now().UTC())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we send some pot-hog event when this happens? Maybe this question doesn't make sense, but I've got no idea how we are using post-hog now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea, I'll mark it down somewhere. Just don't want to complicate this PR further for now.

Comment thread lib/web/apiserver.go
// SSOLoginFailureMessage is a generic error message to avoid disclosing sensitive SSO failure messages.
SSOLoginFailureMessage = "Failed to login. Please check Teleport's log for more details."

assistantTokensPerHour = 140
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc

Comment thread lib/web/apiserver.go
assistantTokensPerHour = 140
// assistantLimiterRate is the rate (in tokens per second)
// at which tokens for the assistant rate limiter are replenished
assistantLimiterRate = rate.Limit(assistantTokensPerHour / float64(time.Hour/time.Second))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
assistantLimiterRate = rate.Limit(assistantTokensPerHour / float64(time.Hour/time.Second))
assistantLimiterRate = rate.Limit(assistantTokensPerHour / time.Hour.Seconds())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Hour.Seconds() sadly can not be used in a const definition

@public-teleport-github-review-bot public-teleport-github-review-bot Bot removed the request for review from hugoShaka May 15, 2023 13:26
Copy link
Copy Markdown
Member

@ryanclark ryanclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should probably style CHAT_MESSAGE_UI messages a bit differently than normal messages to show that it's an error or something, but that can be done in another PR

Comment thread lib/assist/assist.go Outdated
// MessageKindSystemMessage is the type of Assist message that contains the system message.
MessageKindSystemMessage MessageType = "CHAT_MESSAGE_SYSTEM"
// MessageKindUIMessage is the type of Assist message that is presented to user as information, but not stored persistently in the conversation. This can include backend error messages and the like.
MessageKindUIMessage MessageType = "CHAT_MESSAGE_UI"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever be anything other than an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly dunno, for now this rate-limit is the only case 🤷

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinas @ryanclark Can we just change the name to CHAT_MESSAGE_ERROR and keep using that type for all assist-related messages?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should

@justinas justinas enabled auto-merge May 15, 2023 14:06
@justinas justinas force-pushed the justinas/assist-rate-limiter branch from e1ab55c to 9b5f09e Compare May 15, 2023 14:49
@justinas justinas added this pull request to the merge queue May 15, 2023
Merged via the queue into master with commit 21c534b May 15, 2023
@justinas justinas deleted the justinas/assist-rate-limiter branch May 15, 2023 16:56
@public-teleport-github-review-bot
Copy link
Copy Markdown

@justinas See the table below for backport results.

Branch Result
branch/v12 Failed

jakule pushed a commit that referenced this pull request May 15, 2023
* Add rate limiting to Assist

* Only rate limit Assist in Cloud

* Add a comment to assistantLimiter

* Fixes after rebase

* Add 'rate-limited' test case to assistant_test

* Handle CHAT_MESSAGE_UI in Assist web UI

* Add godoc

* CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR

* Run assistant test cases in parallel
jakule added a commit that referenced this pull request May 15, 2023
* Add rate limiting to Assist (#26011)

* Add rate limiting to Assist

* Only rate limit Assist in Cloud

* Add a comment to assistantLimiter

* Fixes after rebase

* Add 'rate-limited' test case to assistant_test

* Handle CHAT_MESSAGE_UI in Assist web UI

* Add godoc

* CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR

* Run assistant test cases in parallel

* Fix incorrect merge

---------

Co-authored-by: Justinas Stankevičius <justinas@users.noreply.github.com>
jakule pushed a commit that referenced this pull request Jun 2, 2023
* Add rate limiting to Assist

* Only rate limit Assist in Cloud

* Add a comment to assistantLimiter

* Fixes after rebase

* Add 'rate-limited' test case to assistant_test

* Handle CHAT_MESSAGE_UI in Assist web UI

* Add godoc

* CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR

* Run assistant test cases in parallel
jakule added a commit that referenced this pull request Jun 3, 2023
* Assist - API implementation (#25810)

* Assist - API implementation

* Add Assistant CRUD test

* Move assistant API to a separate file.

* Rename GRPC methods
Fix comments

* Address PR comments

* Update GRPC API - addressing code review comments

* Move assist API to a new GRPC service

* Add Assistant RBAC rules to this backport.

* Add missing license

* Update comment.

* Move username and ID out of AssistantMessage (#25964)

* Move username and ID out of AssistantMessage

#25810 added username and conversation ID to AssistantMessage. They aren't needed inside the message and can be moved out of it.
This PR also fixes ` grpc: error while marshaling: proto: Marshal called with nil` in `CreateAssistantMessage`.

Note: It's safe to change protobuf numbers as this code hasn't been released or backported yet.

* Fix test

* make grpc

* Assist - Configuration and usage (#25953)

* Assist - Configuration and usage

* Add network config test

* Add config test

* Run GCI

* Address review comments

* Assist - OpenAI library port (#25948)

* Assist - OpenAI library port

* Add tests

* Address code review comments

* Added comment

* Partial backport of #26058

* Move AI messages to a new file

* Prevent blocking on error

* Add comments.
Fix typo

* Assist - Execution web endpoint (#25955)

* Assist - Execution web endpoint

* Add test
Clean up code a bit

* Add missing username

* Address review comments

* Make more implementation shared between Terminal and Command Web Handlers

* Address review comments

* Address review comments

* Fixes after rebase
Add comments

* Add comments
Fix linter

* Add TELEPORT to Teleport related environment variable.

* Assist - Web endpoints (#26046)

* Assist - Web endpoints

* GCI

* proto rpc

* enforce endpoint checks

* disable in ui if disallowed by auth

* add comment

* unwrap stack

* Refactor code

---------

Co-authored-by: Joel Wejdenstål <jwejdenstal@icloud.com>

* [Assist] Fix random user selection (#26183)

Currently, after selecting a user for command execution in Teleport Assist the user can randomly change. This PR fixes this behavior.

* Add rate limiting to Assist (#26011)

* Add rate limiting to Assist

* Only rate limit Assist in Cloud

* Add a comment to assistantLimiter

* Fixes after rebase

* Add 'rate-limited' test case to assistant_test

* Handle CHAT_MESSAGE_UI in Assist web UI

* Add godoc

* CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR

* Run assistant test cases in parallel

* Rename `ChatGPT` to `GPT-4` (#26272)

* Rename `ChatGPT` to `GPT4`

ChatGPT is a user friendly name, but is technically inaccurate.

* Apply suggestions from code review

Co-authored-by: Reed Loden <reed@goteleport.com>

* Lint fix

* Additional ChatGPT --> OpenAI GPT-4 fixes

---------

Co-authored-by: Reed Loden <reed@goteleport.com>

* Fix Assist rate-limiting in Cloud (#26342)

When Proxy is separate from Auth,
Proxy 'modules' will not contain meaningful data.
Instead, one must use ClusterFeatures fetched from the Auth server

* Assist UI improvements (#26365)

* Update assist warning wording, add link to ToS (#26396)

* Always render the portal for the assist title to go into (#26733)

* Stop using TokenPath for API key in Assist (#26671)

* [Assist] MFA support (#26719)

* Initial support for MFA in Assist

* UI webauth handler

* WebUI - WIP

* Run prettier

* Perform MFA ceremony only once.

* Cleanup JS

* Remove hacky WS logic

* Add cancel MFA logic

* [Assist] Prevent creating messages without conversation (#26797)

* Add parallel execution to assist. (#26563)

* Add parallel execution to assist.

* Extract execution logic to a new function.

* Add test

* Switch uber to std

* Address code review comments

* Fix display of Assist command executions with empty output (#27010)

* Fix display of Assist executions with empty output

* Lint

* Nit

* Improve display of commands that failed with code

* Address lint

* [Assist] Allow removing assist conversations (#26788)

* [Assist] Allow removing assist conversations

* Display landing page after the conversion is removed

* Improve styling and add a confirmation dialog

* Change the icon opacity to copy the main navigation

* Remove unused minus icon

* Add missing trace.wrap

---------

Co-authored-by: Ryan Clark <ryan.clark@goteleport.com>

* Backport fixes

* Regenerate go.sum to make the linter happy.

* Sync files after rebase

* After rebase fixes

---------

Co-authored-by: Joel Wejdenstål <jwejdenstal@icloud.com>
Co-authored-by: Justinas Stankevičius <justinas@users.noreply.github.com>
Co-authored-by: Mike Jensen <jentfoo@users.noreply.github.com>
Co-authored-by: Reed Loden <reed@goteleport.com>
Co-authored-by: Ryan Clark <ryan.clark@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants