[Assist] Classify messages for usage analytics by hugoShaka · Pull Request #28221 · gravitational/teleport

hugoShaka · 2023-06-23T18:42:51Z

Fixes part of e.1629

It misses the event reporting, it will be added in another PR.

jakule · 2023-06-23T19:47:25Z

+	}
+
+	cleanedCategory := strings.ToLower(strings.Trim(strings.TrimSpace(category), "."))
+	if _, ok := classes[cleanedCategory]; ok {


I'd consider using strings.Contains instead of exact match to support some random strings added by ChatGPT maybe?

I'm afraid this change would break if we create a category name containing another category name. It can also lead to false positives. From what I tested, error rates are quite low. I'd prefer to wait for real data before doing those kinds of optimizations.

jakule · 2023-06-23T19:50:59Z

+		"troubleshooting",
+		"Troubleshooting",
+		"Troubleshooting.",
+		"non-existent",


Have you ever seen something like "As a Large Language Model, I think that the input can be classified as troubleshooting", or something similar? 😅

From the few dozen tests I ran, I never observed such a response. gpt-3.5 tend to add uppercase and dots, gpt-4 never made mistakes so far

jakule · 2023-06-23T19:52:28Z

+var MessageClasses = map[string]string{
+	"command execution": "the user want to execute a command on one or many servers",
+	"troubleshooting":   "the user wants to diagnose a problem or understand an error message",
+	"configuration":     "the user wants to generate configuration for a software which is not Teleport",


How can the LLM know if the software is "Teleport"? Also, do we even care about Teleport/non-Teleport scenario?

The prompt contains a super-short Teleport description. From what I observed this is enough for gpt-4 to reliably infer if the request is about Teleport or not. (gpt-3.5 showed some issues but was right most of the time). I suspect gpt-4 also learned what Teleport is and can leverage this information when classifying.

I think this is important to separate Teleport-related requests from non-teleport related requests as those two categories are not actionable in the same way. We can help the model to answer Teleport configuration requests by giving it access to the docs and allowing it to link back the user to the docs. On the other hand, we rely on the generic part of the model to answer most other configuration questions (embedding man pages might help but this is a long shot).

From what I understand we will want to know if users are asking "write me a working nginx configuration" or if they are asking "I want the teleport agent configuration to provide access to a ssh server and a database at the same time".

jakule · 2023-06-23T19:54:01Z

+	"configuration":     "the user wants to generate configuration for a software which is not Teleport",
+	"list resources":    "the user wants to list the available resources connected to the Teleport cluster",
+	"access request":    "the user requests access to one or many resources from the Teleport cluster",
+	"teleport action":   "the user wants to do something with its Teleport cluster",


Something like list resources? 😅

I extended list resource and changed this category to reduce the overlap

jakule · 2023-06-23T19:55:18Z

+	pack := s.authPack(t, "foo")
+
+	// Real test: we craft a request asking for a summary
+	endpoint := pack.clt.Endpoint("webapi", "assistant", "title", "summary")


I'm pretty sure that we have a wrapper for that. If not, we should probably add it.

jakule

LGTM

* [Assist] Scaffold the chat-loop onto a multi-step thinking model (#27075) * agent scaffold conversion * command input validation * rename Agent.Think and replace debug logs with trace logs * doc * action docs * godocs * clarify * remove unused code * remove tests which relied on the old non-agent model interaction with the llm * fix broken e * Add node name to the Assist execution result (#27635) * Add node name to the Assist execution result Currently, only node ID is returned on the command execution result in Assist. For better UX we want to display Node name which id more human friendly rather than a node ID which is a UUID. Adding the value to returned payload sounds cheaper than calling an API to get node names. * Add test * Extract commandExecResult struct * Fix test after rebase * Fix command execution test flakiness (#27704) Fix ``` --- FAIL: TestExecuteCommand (1.46s) testing.go:1206: TempDir RemoveAll cleanup: unlinkat /tmp/TestExecuteCommand3553793052/002/log/upload/streaming/default: directory not empty FAIL ``` error * [Assist] Fix panic when writing to one WS from multiple threads (#27828) * [Assist] Fix panic when writing to one WS from multiple threads Fixes https://github.com/gravitational/teleport.e/issues/1650 * Remove mutex on SetReadDeadline * Move SetPongHandler * Fix typos * Fix command output showing when running on multiple nodes (#27936) * ai: Add a node embedding watcher (#27204) * ai: add embeddings basic support - add Embeddings service and its local implementation - add Embedding type and proto message - add nodeEmbeddingCollector tracking nodes - add NodeEmbeddingWatcher watching for events adn sending them to the collector - add the Embedder interface and its openai implementation * ai: adapt embeddings to the vector index * fixup! ai: adapt embeddings to the vector index * fixup! fixup! ai: adapt embeddings to the vector index * Update lib/service/service.go Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> * address feedback pt.1 * address feedback pt.2: store protobuf message in backend * address feedback pt.3: have GetEmbeddings return a stream * Update lib/services/embeddings.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> * address feedback pt.4: extract embedding logic out of Embeddings service * fixup! address feedback pt.4: extract embedding logic out of Embeddings service * address feedback pt.5: simpler error handling when embedding fails * fix tests pt.1 * fix tests pt.2 * fix tests pt.3 * [Assist] Replace embedding watcher (#27953) Change the way how the embeddings are calculated. Instead of creating a watcher in Auth, we will process all nodes every hour and process embeddings if any embeddings are missing or any node has been updated. --------- Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> * Restore `lib/ai` tests (#28077) * Restore `lib/ai` tests The tests were removed as a part of #27075. This PR updates the tests to use the new logic. * Fix tests * Restore lib/web tests * GCI * Move test handler to a common place * Fix used token test * Add comment * Remove duplicate imports (#27886) * [Assist] Remove the empty assist message (#28125) * [Assist] Remove the empty assist message Assist shows an empty message at the beginning of each conversation when reading it from DB. This PR fixes that behavior and adds a test to prevent this from happening in the future. * Address code review comments * Address code review comments * Skip embedding processor on Cloud Non-Team plan (#28197) * ai: compute opportinistic summary of command execution (#28033) * ai: compute opportinistic summary of command execution * ai: add streaming summary back after rebase on new front-end * Lint and fix tests pt.1 * reference nodes by name and add tests * Lint, fix tests and address feedback * Attempt to tame the stream close monster * fixup! Attempt to tame the stream close monster * [Assist] Do not close the WS after command execution (#28246) * Revert "fixup! Attempt to tame the stream close monster" This reverts commit 8537aa2. * Revert "Attempt to tame the stream close monster" This reverts commit e0c861d. * Do not close the WS after command execution * Fix tests and lint * fixup! Fix tests and lint * undo put web test command into constant --------- Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> * [Assist] Include embeddings in the prompt (#28116) * [Assist] Include embeddings in the prompt * Add comments GCI Minor fixes * Move stuff * Fix tests * Fix tests * Fixes after rebase Apply code review suggestions. * Address review comments * After rebase fix * Improve error handling and embedding prompts; fix typos (#28403) * "Improve error handling and embedding prompts; fix typos" This commit encompasses several changes. First, an error handling routine has been added in AssistContext.tsx to properly close a WebSocket connection and finish all results. The intent is to ensure that execution fails gracefully when a session doesn't end normally. In tool.go, user instructions have been made more explicit to ensure users check access to nodes before generating any commands. It warns them that not checking access will cause error. Also, some minor typos were corrected in agent.go and messages.go for better readability. * "Refactor 'hosts' to 'nodes' in AI Tool Descriptions" This commit refactors the language from 'host' terminology to 'node' terminology in the AI tool's generated responses as the LLM seems to be confused when generating queries with embeddings. * Update expected test values in chat_test.go The expected values in three different tests in chat_test.go have been updated. This change was required because the underlying algorithm has been adjusted and these modifications will keep the tests aligned with the current algorithm's behavior. * Add missing imports * Introduce user preferences (#28291) * Add user preferences feature * Add missing license header * Fix the order of arguments to require.Equal * Update lib/web/userpreferences.go Co-authored-by: Michelle Bergquist <11967646+michellescripts@users.noreply.github.com> * Add a `GetUserPreferencesResponse` message * Remove unused logger * Use .Put instead of .Create/.Update * Add missing godoc * trace.Wrap the happy path --------- Co-authored-by: Michelle Bergquist <11967646+michellescripts@users.noreply.github.com> * Shut down embedding processor on graceful exit (#28356) * Refactor websocket termination and stream handling (#28452) * Refactor websocket termination and stream handling Refactored websocket stream shutdown and error handling. Replaced `Close()` with `SendCloseMessage()` for better control over the websocket connection termination process. Added checks for the validity of channels to prevent reading from closed channels. The commit also includes minor typo fixes. * Remove unused completedC * Remove unnecessary select blocks in terminal.go The select blocks used in terminal.go for reading data from channels were unnecessary as we were just pulling from a single channel. Removed the select block and directly attempted to read from the channel. These changes increase code readability and integrity by removing unnecessary select blocks. In the command_test.go, an explanatory comment was added for clarity. * Remove commented code * Replace trace.NewAggregate with trace.Wrap as aggregation is not needed. * Add the UI for Assist's settings (#28413) * Add the UI for Assist's settings * Add typing * Fix test by wrapping render in LayoutContextProvider * Run prettier * Assist: fix summary logic (#28487) * Update command.go * simplify export signature * assist: add classification code (#28221) * [Assist] Provide interactive updates during agent execution (#27893) * send progress update messages during agent thoughts * handle new output format * define json tags for serialized fields * use streaming api * fan streaming from model loop * fix streaming * stream progress updates * Update lib/assist/assist.go Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> * remove useless mute * nits * Update lib/ai/model/agent.go Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> * fix merge * fix misc * more misc fixes * what * what2 * weird eof errors? * Fix tests UI integration * Fix other tests * Linter fixes * Comment out token counting for assist streams to avoid race condition. * Fix more tests --------- Co-authored-by: Jakub Nyckowski <jakub.nyckowski@goteleport.com> * Remove console.log in AssistContext (#28607) --------- Co-authored-by: Joel <jwejdenstal@goteleport.com> Co-authored-by: Ryan Clark <ryan.clark@goteleport.com> Co-authored-by: Hugo Shaka <hugo.hervieux@goteleport.com> Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com> Co-authored-by: Justinas Stankevičius <justinas@users.noreply.github.com> Co-authored-by: Michelle Bergquist <11967646+michellescripts@users.noreply.github.com>

hugoShaka requested review from jakule and xacrimon June 23, 2023 18:42

github-actions Bot added the size/md label Jun 23, 2023

github-actions Bot requested review from gzdunek and ravicious June 23, 2023 18:43

jakule reviewed Jun 23, 2023

View reviewed changes

hugoShaka changed the title ~~Hugo/classify assist messages~~ [Assist] Classify messages for usage analytics Jun 27, 2023

hugoShaka force-pushed the hugo/classify-assist-messages branch from df7a062 to 5931b05 Compare June 27, 2023 19:27

jakule approved these changes Jun 28, 2023

View reviewed changes

Comment thread lib/ai/model/prompt.go Outdated

hugoShaka force-pushed the hugo/classify-assist-messages branch 2 times, most recently from bdeca95 to f71689d Compare June 28, 2023 20:34

hugoShaka mentioned this pull request Jun 29, 2023

assist: add execution and discussion usage events #28492

Merged

xacrimon approved these changes Jun 30, 2023

View reviewed changes

public-teleport-github-review-bot Bot removed request for gzdunek and ravicious June 30, 2023 16:45

assist: add classification code

a8f199f

hugoShaka force-pushed the hugo/classify-assist-messages branch from f71689d to a8f199f Compare June 30, 2023 19:07

hugoShaka enabled auto-merge June 30, 2023 19:07

hugoShaka added this pull request to the merge queue Jun 30, 2023

Merged via the queue into master with commit 303aada Jun 30, 2023

hugoShaka deleted the hugo/classify-assist-messages branch June 30, 2023 19:43

jakule pushed a commit that referenced this pull request Jul 3, 2023

assist: add classification code (#28221)

d862226

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Assist] Classify messages for usage analytics#28221

[Assist] Classify messages for usage analytics#28221
hugoShaka merged 1 commit intomasterfrom
hugo/classify-assist-messages

hugoShaka commented Jun 23, 2023 •

edited

Loading

Uh oh!

Uh oh!

jakule Jun 23, 2023

Uh oh!

hugoShaka Jun 27, 2023

Uh oh!

Uh oh!

jakule Jun 23, 2023

Uh oh!

hugoShaka Jun 27, 2023

Uh oh!

jakule Jun 23, 2023

Uh oh!

hugoShaka Jun 27, 2023

Uh oh!

Uh oh!

jakule Jun 23, 2023

Uh oh!

hugoShaka Jun 27, 2023

Uh oh!

jakule Jun 23, 2023

Uh oh!

Uh oh!

jakule left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hugoShaka commented Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakule left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hugoShaka commented Jun 23, 2023 •

edited

Loading