Skip to content

[8.x] [Security Solution] [AI Assistant] security assistant content references (#206683)#208666

Merged
KDKHD merged 4 commits intoelastic:8.xfrom
KDKHD:backport/8.x/pr-206683
Jan 29, 2025
Merged

[8.x] [Security Solution] [AI Assistant] security assistant content references (#206683)#208666
KDKHD merged 4 commits intoelastic:8.xfrom
KDKHD:backport/8.x/pr-206683

Conversation

@KDKHD
Copy link
Copy Markdown
Member

@KDKHD KDKHD commented Jan 29, 2025

Backport

This will backport the following commits from main to 8.x:

Questions ?

Please refer to the Backport tool documentation

…ces (#206683)

> [!note]
> Planning to merge before the 9.0 feature freeze.
Documentation update issue:
elastic/security-docs#6473

> [!tip]
> ### Tip for the reviewer
> As a starting point to review this PR I would suggest reading the
section "How does it work (on a high level)" and viewing the hyperlinked
code. The linked code covers the main concepts of this feature and the
majority of the remaining changes in the PR are related to API schema
updates and tests.

## Summary

This PR adds citations to the security AI assistant. Citations are
produced when tools are used and they are displayed in the LLM response
as numbered superscript elements. A label appears when the user hovers
over the numbered elements and clicking on the label opens a new tab
that displays the cited data.

## How to test:
1. Enables the feature flags:
- Set value to `true`
[here](https://github.com/elastic/kibana/pull/206683/files#diff-f55be7c50853801c3933b48064ab9cbf0356e2941cd97c4c365be7a6ded9bffdR125)
2. Populate the security knowledge base with some information (e.g. a
document, an index, product documentation, the global threat report,
etc...).
3. Open the security assistant
4. Ask it the following questions about data in the knowledge base:
- What is an elastic search cold tier? - Make sure product docs are
installed
 - What topics are covered in the security lab content?
 - Ask it a question about one of your knowledge base documents.
- Which platform did my most recent alert happen on? - Make sure you
have a recent alert

## How does it work (on a high level)?

Citations are stored inside the
[ContentReferencesStore](https://github.com/elastic/kibana/pull/206683/files#diff-baf03ce192db4f13999748a38b4d920428358a4ffc62527a1d6ac0d9b234f306R17)
object. When tools are called, the tools [add citations to the
ContentReferencesStore](https://github.com/elastic/kibana/pull/206683/files#diff-5a333fdd9bf864dced06500263577e495c95c9b32c7dae9074090775df542d22R97-R99)
and pass the Id of the [ContentReferences back to the
LLM](https://github.com/elastic/kibana/pull/206683/files#diff-5a333fdd9bf864dced06500263577e495c95c9b32c7dae9074090775df542d22R102)
along side the result of the tool. The LLM can then use those
contentReference IDs in its response by forming a response like:
```
The sky is blue {reference(12345)}
```
The web client [parses out the
contentReference](https://github.com/elastic/kibana/pull/206683/files#diff-3a5c8305ac899a9e78903b0b60141dd997ba61e87342de2b9ec377165d99cfe6R23)
(`{reference(12345)}`) from the assistant message and[ replaces it with
the citation react
component](https://github.com/elastic/kibana/pull/206683/files#diff-db928fb87a862e3ebf7247baefc418de539f9c0f3fc5134a2ef56f921a52bdcbR125-R129).

### Tools that are cited:

Include citations for the following tools:
alert_counts_tool -> cites to alerts page
knowledge_base_retrieval_tool -> cites knowledge base management page
with specific entry pre-filtered
open_and_acknowledged_alerts_tool -> cites to specific alert
security_labs_tool -> cites knowledge base management page with specific
entry pre-filtered
knowledge_base indices -> opens ESQL view selecting the particular
document used
product_documentation ->  cites documentation

### Endpoints impacted
- POST
/internal/elastic_assistant/actions/connector/{connectorId}/_execute
- POST /api/security_ai_assistant/chat/complete
- GET /api/security_ai_assistant/current_user/conversations/_find
- GET /api/security_ai_assistant/current_user/conversations/:id
- PUT /api/security_ai_assistant/current_user/conversations/{id}

### Considerations:
- One of the main objectives of this feature was to produce in-text
citations to create a great user experience. Multiple approaches were
tested to do this reliably. Attempts were made to make the LLM return
structured JSON containing the citations however this was unreliable
with smaller models. Generation post-processing (issuing an additional
LLM call to annotate the response with citations) was also explored
however this also had limitations as the second LLM call would not
contain enough contextual information to create the citations reliably.
Eventually, the approach described in the section above was used
alongside few shot promoting.
- Instead of using the ContentReferencesStore to store citations, the
langGraph state could be used to save the citations. I looked at doing
this but currently, there are a few blockers in the langgraph API the
prevent this.
- Lang graph must be updated to @langchain/langgraph>=0.2.31 to get
access to the Command type so that tools can update the graph state.
- It seems that DynamicStructuredTools do not support the Command type
yet. This is something that we can clarify with the langchain team.
Once these blockers have been addressed, ContentReferencesStore could
easily be refactored to the graph state.
- The feature has been put behind a feature flag so we can test during
the feature freeze and sync the release of the documentation update. The
only thing that is not behind a feature flag is the new anonymization
button in the settings menu (don't think it is necessary and it means a
lot more code changes are required).

On few occasions, you can nudge the LLM a bit more to include citations
by appending "Include citations" to your message.

![image](https://github.com/user-attachments/assets/e87b010b-4c29-48c7-8b2b-f17ad1878b8b)

Furthermore, the settings menu has been updated to include anonymized
values and citation toggles:

![image](https://github.com/user-attachments/assets/efcbabe5-4325-4b6b-b387-84295cb0fb70)

### Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

- [X] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [X] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [X] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

## Release note

Adds in-text citations to security solution AI assistant responses.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Patryk Kopycinski <contact@patrykkopycinski.com>
(cherry picked from commit 5f888d0)

# Conflicts:
#	x-pack/platform/packages/shared/kbn-elastic-assistant/impl/assistant/assistant_header/index.tsx
#	x-pack/platform/packages/shared/kbn-elastic-assistant/impl/assistant/index.tsx
#	x-pack/platform/packages/shared/kbn-elastic-assistant/impl/assistant/settings/settings_context_menu/settings_context_menu.tsx
#	x-pack/solutions/security/plugins/security_solution/public/assistant/get_comments/stream/message_text.tsx
@KDKHD KDKHD added the backport This PR is a backport of another PR label Jan 29, 2025
@KDKHD KDKHD requested a review from kibanamachine as a code owner January 29, 2025 02:11
@KDKHD KDKHD enabled auto-merge (squash) January 29, 2025 02:11
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
integrationAssistant 473 477 +4
securitySolution 6625 6639 +14
total +18

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/elastic-assistant-common 414 428 +14
elasticAssistant 40 41 +1
total +15

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 18.5MB 18.6MB +9.7KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 87.6KB 87.7KB +28.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
_inference_fields - 1 +1
Unknown metric groups

API count

id before after diff
@kbn/elastic-assistant-common 451 502 +51
elasticAssistant 55 56 +1
total +52

History

@KDKHD KDKHD requested a review from a team January 29, 2025 08:16
@KDKHD KDKHD merged commit cb595ef into elastic:8.x Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants