Conversation
There was a problem hiding this comment.
Latest discussion about the rate limit: https://gravitational.slack.com/archives/C0509EYASCW/p1683738890958669
There was a problem hiding this comment.
Using this instead of oxy ratelimiter, as the latter is quite tightly coupled with individual http.Request-s.
There was a problem hiding this comment.
Can you add this PR command as a code command too? The reasoning won't be easily visible to people after this PR is merged.
f6788a8 to
ed8a4fd
Compare
There was a problem hiding this comment.
Can you add this PR command as a code command too? The reasoning won't be easily visible to people after this PR is merged.
There was a problem hiding this comment.
Do we have an issue to trace it? If yes, can you link it here?
There was a problem hiding this comment.
This should not be an assistant message. We should create a new message type, so the UI can react to it instead of displaying an error as a regular message. We should also close the WS connection as a client should not write anything more.
There was a problem hiding this comment.
I have made a new message type, however I kept the frontend behavior the same for now. WS connection now closes
ed8a4fd to
b598ee0
Compare
Added a test case where rate limit is hit. |
0742181 to
c52f349
Compare
jakule
left a comment
There was a problem hiding this comment.
I'm fine with the backend changes, but please wait for @ryanclark to approve the UI changes.
| ): Promise<MessagesAction> { | ||
| if (message.type === 'CHAT_MESSAGE_ASSISTANT') { | ||
| if ( | ||
| message.type === 'CHAT_MESSAGE_ASSISTANT' || |
| TotalTokens: int64(usedTokens.Prompt + usedTokens.Competition), | ||
| TotalTokens: int64(usedTokens.Prompt + usedTokens.Completion), | ||
| PromptTokens: int64(usedTokens.Prompt), | ||
| CompletionTokens: int64(usedTokens.Competition), |
| // Try to consume a small amount of tokens first. | ||
| const lookaheadTokens = 100 | ||
| if !h.assistantLimiter.AllowN(time.Now(), lookaheadTokens) { | ||
| err := onMessageFn(assist.MessageKindUIMessage, []byte("You have reached the rate limit. Please try again later."), h.clock.Now().UTC()) |
There was a problem hiding this comment.
Should we send some pot-hog event when this happens? Maybe this question doesn't make sense, but I've got no idea how we are using post-hog now.
There was a problem hiding this comment.
Might be a good idea, I'll mark it down somewhere. Just don't want to complicate this PR further for now.
| // SSOLoginFailureMessage is a generic error message to avoid disclosing sensitive SSO failure messages. | ||
| SSOLoginFailureMessage = "Failed to login. Please check Teleport's log for more details." | ||
|
|
||
| assistantTokensPerHour = 140 |
| assistantTokensPerHour = 140 | ||
| // assistantLimiterRate is the rate (in tokens per second) | ||
| // at which tokens for the assistant rate limiter are replenished | ||
| assistantLimiterRate = rate.Limit(assistantTokensPerHour / float64(time.Hour/time.Second)) |
There was a problem hiding this comment.
nit:
| assistantLimiterRate = rate.Limit(assistantTokensPerHour / float64(time.Hour/time.Second)) | |
| assistantLimiterRate = rate.Limit(assistantTokensPerHour / time.Hour.Seconds()) |
There was a problem hiding this comment.
time.Hour.Seconds() sadly can not be used in a const definition
ryanclark
left a comment
There was a problem hiding this comment.
LGTM, we should probably style CHAT_MESSAGE_UI messages a bit differently than normal messages to show that it's an error or something, but that can be done in another PR
| // MessageKindSystemMessage is the type of Assist message that contains the system message. | ||
| MessageKindSystemMessage MessageType = "CHAT_MESSAGE_SYSTEM" | ||
| // MessageKindUIMessage is the type of Assist message that is presented to user as information, but not stored persistently in the conversation. This can include backend error messages and the like. | ||
| MessageKindUIMessage MessageType = "CHAT_MESSAGE_UI" |
There was a problem hiding this comment.
Will this ever be anything other than an error?
There was a problem hiding this comment.
Honestly dunno, for now this rate-limit is the only case 🤷
There was a problem hiding this comment.
@justinas @ryanclark Can we just change the name to CHAT_MESSAGE_ERROR and keep using that type for all assist-related messages?
e1ab55c to
9b5f09e
Compare
* Add rate limiting to Assist * Only rate limit Assist in Cloud * Add a comment to assistantLimiter * Fixes after rebase * Add 'rate-limited' test case to assistant_test * Handle CHAT_MESSAGE_UI in Assist web UI * Add godoc * CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR * Run assistant test cases in parallel
* Add rate limiting to Assist (#26011) * Add rate limiting to Assist * Only rate limit Assist in Cloud * Add a comment to assistantLimiter * Fixes after rebase * Add 'rate-limited' test case to assistant_test * Handle CHAT_MESSAGE_UI in Assist web UI * Add godoc * CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR * Run assistant test cases in parallel * Fix incorrect merge --------- Co-authored-by: Justinas Stankevičius <justinas@users.noreply.github.com>
* Add rate limiting to Assist * Only rate limit Assist in Cloud * Add a comment to assistantLimiter * Fixes after rebase * Add 'rate-limited' test case to assistant_test * Handle CHAT_MESSAGE_UI in Assist web UI * Add godoc * CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR * Run assistant test cases in parallel
* Assist - API implementation (#25810) * Assist - API implementation * Add Assistant CRUD test * Move assistant API to a separate file. * Rename GRPC methods Fix comments * Address PR comments * Update GRPC API - addressing code review comments * Move assist API to a new GRPC service * Add Assistant RBAC rules to this backport. * Add missing license * Update comment. * Move username and ID out of AssistantMessage (#25964) * Move username and ID out of AssistantMessage #25810 added username and conversation ID to AssistantMessage. They aren't needed inside the message and can be moved out of it. This PR also fixes ` grpc: error while marshaling: proto: Marshal called with nil` in `CreateAssistantMessage`. Note: It's safe to change protobuf numbers as this code hasn't been released or backported yet. * Fix test * make grpc * Assist - Configuration and usage (#25953) * Assist - Configuration and usage * Add network config test * Add config test * Run GCI * Address review comments * Assist - OpenAI library port (#25948) * Assist - OpenAI library port * Add tests * Address code review comments * Added comment * Partial backport of #26058 * Move AI messages to a new file * Prevent blocking on error * Add comments. Fix typo * Assist - Execution web endpoint (#25955) * Assist - Execution web endpoint * Add test Clean up code a bit * Add missing username * Address review comments * Make more implementation shared between Terminal and Command Web Handlers * Address review comments * Address review comments * Fixes after rebase Add comments * Add comments Fix linter * Add TELEPORT to Teleport related environment variable. * Assist - Web endpoints (#26046) * Assist - Web endpoints * GCI * proto rpc * enforce endpoint checks * disable in ui if disallowed by auth * add comment * unwrap stack * Refactor code --------- Co-authored-by: Joel Wejdenstål <jwejdenstal@icloud.com> * [Assist] Fix random user selection (#26183) Currently, after selecting a user for command execution in Teleport Assist the user can randomly change. This PR fixes this behavior. * Add rate limiting to Assist (#26011) * Add rate limiting to Assist * Only rate limit Assist in Cloud * Add a comment to assistantLimiter * Fixes after rebase * Add 'rate-limited' test case to assistant_test * Handle CHAT_MESSAGE_UI in Assist web UI * Add godoc * CHAT_MESSAGE_UI -> CHAT_MESSAGE_ERROR * Run assistant test cases in parallel * Rename `ChatGPT` to `GPT-4` (#26272) * Rename `ChatGPT` to `GPT4` ChatGPT is a user friendly name, but is technically inaccurate. * Apply suggestions from code review Co-authored-by: Reed Loden <reed@goteleport.com> * Lint fix * Additional ChatGPT --> OpenAI GPT-4 fixes --------- Co-authored-by: Reed Loden <reed@goteleport.com> * Fix Assist rate-limiting in Cloud (#26342) When Proxy is separate from Auth, Proxy 'modules' will not contain meaningful data. Instead, one must use ClusterFeatures fetched from the Auth server * Assist UI improvements (#26365) * Update assist warning wording, add link to ToS (#26396) * Always render the portal for the assist title to go into (#26733) * Stop using TokenPath for API key in Assist (#26671) * [Assist] MFA support (#26719) * Initial support for MFA in Assist * UI webauth handler * WebUI - WIP * Run prettier * Perform MFA ceremony only once. * Cleanup JS * Remove hacky WS logic * Add cancel MFA logic * [Assist] Prevent creating messages without conversation (#26797) * Add parallel execution to assist. (#26563) * Add parallel execution to assist. * Extract execution logic to a new function. * Add test * Switch uber to std * Address code review comments * Fix display of Assist command executions with empty output (#27010) * Fix display of Assist executions with empty output * Lint * Nit * Improve display of commands that failed with code * Address lint * [Assist] Allow removing assist conversations (#26788) * [Assist] Allow removing assist conversations * Display landing page after the conversion is removed * Improve styling and add a confirmation dialog * Change the icon opacity to copy the main navigation * Remove unused minus icon * Add missing trace.wrap --------- Co-authored-by: Ryan Clark <ryan.clark@goteleport.com> * Backport fixes * Regenerate go.sum to make the linter happy. * Sync files after rebase * After rebase fixes --------- Co-authored-by: Joel Wejdenstål <jwejdenstal@icloud.com> Co-authored-by: Justinas Stankevičius <justinas@users.noreply.github.com> Co-authored-by: Mike Jensen <jentfoo@users.noreply.github.com> Co-authored-by: Reed Loden <reed@goteleport.com> Co-authored-by: Ryan Clark <ryan.clark@goteleport.com>
Rate limit Assist in Cloud based on the amount of tokens used (cluster-wide). This is a naive implementation that only stores the counter in memory.