[Security Solution][Elastic AI Assistant] Adds Model Evaluation Tooling#167220
[Security Solution][Elastic AI Assistant] Adds Model Evaluation Tooling#167220spong merged 25 commits intoelastic:mainfrom
Conversation
...es/kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/evaluation_settings.tsx
Outdated
Show resolved
Hide resolved
...kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/use_perform_evaluation.tsx
Outdated
Show resolved
Hide resolved
...kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/use_perform_evaluation.tsx
Outdated
Show resolved
Hide resolved
| }) | ||
| .showHelpOnFail(false), | ||
| (argv) => { | ||
| // performEvaluation({ dataset: DEFAULT_DATASET, logger }).catch((err) => { |
There was a problem hiding this comment.
this is a placeholder per an offline discussion
There was a problem hiding this comment.
++, this is for utilizing the yarn evaluate-models CLI tooling. Ended up going the UI route first for flexibility/ease of use, but this is where we'll plumb through the CLI/test tooling.
x-pack/plugins/elastic_assistant/server/routes/evaluate/post_evaluate.ts
Show resolved
Hide resolved
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets/esql_dataset.json
Outdated
Show resolved
Hide resolved
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/evaluation.ts
Outdated
Show resolved
Hide resolved
andrew-goldstein
left a comment
There was a problem hiding this comment.
Thanks @spong for providing this capability to test at scale 🙏
✅ Desk tested locally
LGTM 🚀
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
…scover link, and fixes unhandled promise exception
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @spong |
Summary
This PR introduces a new
internal/elastic_assistant/evaluateroute andEvaluationAdvanced Setting within the Assistant for benchmarking and testing models, agents, and other aspects of the Assistant configuration.Enable via the
assistantModelEvaluationexperimental feature in yourkibana.dev.yml(and better adddiscoverInTimelinefor good measure as well! :)Then access from within the
Advanced Settingsmodal in the Assistant. To use, first select your Connectors/Models, then corresponding Agent configurations, then what model you would like to use for final evaluation, the evaluation type, and ifcustom, you can specify the evaluation prompt that is sent off to the evaluator model. Finally, specify thedataset, andoutput indexthat the results should be written to, then clickPerform evaluation.Sample datasets can be found in
x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets, and include:esql_dataset.jsonquery_dataset.jsonsecurity_labs.jsonsecurity_questions_dataset.jsonChecklist
Delete any items that are not applicable to this PR.