[Security Solution][Elastic AI Assistant] Adds Model Evaluation Tooling by spong · Pull Request #167220 · elastic/kibana

spong · 2023-09-26T08:17:58Z

Summary

This PR introduces a new internal/elastic_assistant/evaluate route and Evaluation Advanced Setting within the Assistant for benchmarking and testing models, agents, and other aspects of the Assistant configuration.

Enable via the assistantModelEvaluation experimental feature in your kibana.dev.yml (and better add discoverInTimeline for good measure as well! :)

xpack.securitySolution.enableExperimental: ['assistantModelEvaluation', 'discoverInTimeline']

Then access from within the Advanced Settings modal in the Assistant. To use, first select your Connectors/Models, then corresponding Agent configurations, then what model you would like to use for final evaluation, the evaluation type, and if custom, you can specify the evaluation prompt that is sent off to the evaluator model. Finally, specify the dataset, and output index that the results should be written to, then click Perform evaluation.

Sample datasets can be found in x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets, and include:

esql_dataset.json
query_dataset.json
security_labs.json
security_questions_dataset.json

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support

…eval

…fix'

…eval

x-pack/plugins/security_solution/common/experimental_features.ts

...es/kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/evaluation_settings.tsx

...kbn-elastic-assistant/impl/assistant/settings/evaluation_settings/use_perform_evaluation.tsx

andrew-goldstein · 2023-09-26T22:08:57Z

x-pack/plugins/elastic_assistant/scripts/model_evaluator_script.ts

+          })
+          .showHelpOnFail(false),
+      (argv) => {
+        // performEvaluation({ dataset: DEFAULT_DATASET, logger }).catch((err) => {


this is a placeholder per an offline discussion

++, this is for utilizing the yarn evaluate-models CLI tooling. Ended up going the UI route first for flexibility/ease of use, but this is where we'll plumb through the CLI/test tooling.

x-pack/plugins/elastic_assistant/server/routes/evaluate/post_evaluate.ts

x-pack/plugins/elastic_assistant/server/lib/model_evaluator/datasets/esql_dataset.json

x-pack/plugins/elastic_assistant/server/lib/model_evaluator/evaluation.ts

andrew-goldstein

Thanks @spong for providing this capability to test at scale 🙏
✅ Desk tested locally
LGTM 🚀

…eval

elasticmachine · 2023-09-26T23:23:26Z

Pinging @elastic/security-solution (Team: SecuritySolution)

…eval

…fix'

…eval

…scover link, and fixes unhandled promise exception

…eval

kibana-ci · 2023-09-29T15:30:25Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 8429801

Failed CI Steps

FTR Configs #5

Test Failures

[job] [logs] FTR Configs #5 / serverless observability UI navigation navigate observability sidenav & breadcrumbs

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	4557	4560	+3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	12.8MB	12.8MB	+53.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	62.7KB	62.7KB	+28.0B

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`elasticAssistant`	10	13	+3

Total ESLint disabled count

id	before	after	diff
`elasticAssistant`	10	13	+3

History

💔 Build #163801 failed 8657581
💔 Build #163768 failed 62c8cf0
💔 Build #163761 failed 84c20cc
💔 Build #163714 failed e4ece02
💛 Build #163639 was flaky d900c7b

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @spong

[Redo this PR](#167220) because [this PR](#167220) merged shortly before broke it and I had to fix an import --------- Co-authored-by: lcawl <lcawley@elastic.co>

spong added 2 commits September 26, 2023 02:05

Adds model evaluator

1a9d62c

Merge branch 'main' of github.com:elastic/kibana into assistant-esql-…

99c7383

…eval

spong added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Security Assistant Security Assistant v8.11.0 labels Sep 26, 2023

spong self-assigned this Sep 26, 2023

spong and others added 6 commits September 26, 2023 08:19

Merge branch 'main' of github.com:elastic/kibana into assistant-esql-…

fcb266a

…eval

Updates tsconfig for scripts folder

3f4d042

[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…

b19d8c2

…fix'

[CI] Auto-commit changed files from 'node scripts/lint_packages --fix'

01beda7

Write evaluation results to ES index

d5886f3

Merge branch 'main' of github.com:elastic/kibana into assistant-esql-…

120ead0

…eval