-
Notifications
You must be signed in to change notification settings - Fork 8.6k
[Security Assistant] AI Assistant - Better Solution for OSS models (#10416) #194166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
stephmilovic
merged 25 commits into
elastic:main
from
e40pud:security/genai/10416-OSS-models
Oct 7, 2024
Merged
Changes from 13 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
bc4228d
OSS LLM
e40pud 4d5c841
OSS LLMs streaming fixes
e40pud 45c8ac0
Use api URL to verify OSS llms vs OpenAI
e40pud 98abd53
Prompting
e40pud 7a7f7e2
Use provider type to better identify OSS model - handles case with th…
e40pud 6903909
Enable `NaturalLanguageESQLTool` for OSS models like Llama
e40pud 7436f34
Fix the issue with extra escape backslash characters which breaks the…
e40pud ae638e6
Revert streaming events parsing
e40pud a03e5ea
Simplified OSS model streaming
e40pud 5d2e1f2
Add OSS model specific prompt to the tool description
e40pud f9eb9d7
Brush up implementation and add some unit tests
e40pud 351c2f7
Remove redundant code
e40pud d78af80
Merge branch 'main' into security/genai/10416-OSS-models
e40pud 8689cc2
Merge branch 'main' into security/genai/10416-OSS-models
elasticmachine 046c3c5
Merge branch 'main' into security/genai/10416-OSS-models
e40pud c0a00eb
Merge branch 'main' into security/genai/10416-OSS-models
e40pud 69543ce
Make sure we log evaluation results and errors
e40pud 5c7bf7f
Merge branch 'main' into security/genai/10416-OSS-models
e40pud db9fd74
Merge branch 'main' into security/genai/10416-OSS-models
e40pud d0458eb
Merge branch 'main' into security/genai/10416-OSS-models
e40pud 26cb4c3
Merge branch 'main' into security/genai/10416-OSS-models
elasticmachine 513d4bf
Review feedback: long time request issue
e40pud 1d60365
Update x-pack/plugins/elastic_assistant/server/routes/utils.ts
e40pud 0f96231
Review feedback: naming
e40pud 59fb225
Review feedback: re-instantiate AbortController after the `abort`
e40pud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this
awaitsolves the issue where different OSS models fail multiple tests due to error:This mostly happens with Llama model, but occasionally with Mistral as well.
Here is the example of those failures https://smith.langchain.com/o/b739bf24-7ba4-4994-b632-65dd677ac74e/datasets/261dcc59-fbe7-4397-a662-ff94042f666c/compare?selectedSessions=f26a13b9-73ba-4e6d-992b-3b4eda61bb20&baseline=undefined&activeSession=f26a13b9-73ba-4e6d-992b-3b4eda61bb20
I feel it has something to do with the slowness of the model (not proved yet) which leads to timeouts.
@spong How important for us to make sure that all experiments are finished and only then respond back? Can we just schedule those experiments and return and log evaluation results and handle errors in a different way here? Something like this:
also we would need to add better error handling in this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that should be fine. We used to use the result output and write that to the local ES, but don't do that anymore, and so just debug log the output since we've been using LangSmith for analysis after the fact. Really we just need to know if the eval was successful from the API perspective.
That said, I'm curious why this
awaitwould cause internal issues with the chat clients. That shouldn't change anything with the concurrency or executions, but seems with it that previous ones are getting cancelled? Perhaps there's some shared state between the chat clients like @stephmilovic fixed withcreateLlmInstanceover in #190004?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you turn down
maxConcurrencyto1and run with theawaitdo you still see it? Curious if rate limiting or something might be cancelling the previous runs?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I was able to run simple stress testing against our llama deployment and I was able to get this error which I guess is the reason our evaluations are failing sometimes
I don't think, this is something we cannot control in code and will need to be configured in llm deployment if we need it for evaluations. I found this article https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest#assign-quota which shows how to address that in deployment settings (sent to James as well). There we have an option to assign Tokens-Per-Minute (TPM) to llm deployment.
Btw, lowering
maxConcurrencydid not help with the issue. Setting it to1failed requests even faster for whatever reasons.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To lower chances of hitting rate limit failure, I removed the
awaitand restructured code in order to log results and errors at the end of execution. This will allow users to avoid 429 errors in most cases.After talking to @jamesspi, we agreed that this is something user can fix on their end by increasing the quota and resources. We would prefer to keep things simple instead of finding a good solution to deal with 429 during evaluations and make our code complicated.
@spong what do you think?