[UII] Set up knowledge index entry for .integration_knowledge#231866
[UII] Set up knowledge index entry for .integration_knowledge#231866jen-huang wants to merge 8 commits intoelastic:mainfrom
.integration_knowledge#231866Conversation
| if (!entryExists) { | ||
| logger.debug('Creating integration knowledge index entry...'); | ||
|
|
||
| const entry = await kbDataClient.createKnowledgeBaseEntry({ |
There was a problem hiding this comment.
AFAIU, this will be executed as the current user at the time of the request context factory creation. my concerns:
-
I hoped that I could use a system/internal user instead but that doesn't appear to be a concept with these clients. it feels right to use a system user because we expect this entry to always exist so as long as the AI Assistant is available.
should a way to use a system user be introduced or is scoping to current user not a problem?
-
or perhaps this setup lives in the wrong place? if this were Fleet I would expect this code to execute during plugin start/setup, but
kbDataClientis not initialized there
There was a problem hiding this comment.
This is actually executed with esClient.asInternalUser (usage, plugin init), however this path still verifies that the user is authenticated and has global privileges, which might not be the case, and so this would error out. Looks like you can bypass this though and go directly to the underlying createKnowledgeBaseEntry() implementation and pass a dummy user object.
I think this should work, as the IndexEntry is marked as global and shouldn't be constrained by any user filters, but I'm hesitant as we don't do this anywhere else -- the security labs content appears to be installed as the current user as well still, so something that needs fixed here...
WRT 2., we actually do all our index setup during plugin start/setup via the AIAssistantService, so we could technically do this over there once all the assets have been installed. The reason we don't do this for the Security Labs content is that they're DocumentEntries and so have semantic_text field content, so we need to ensure ELSER is deployed and ready beforehand (which happens as part of KB setup).
There was a problem hiding this comment.
I'm actually just about to pick up a major refactor of the kbDataClient to add multilingual support for 9.2 (by adding support for arbitrary inferenceId's), so I can try and address some of these ergonomics as part of that. Please let me know if there's anything else you're tracking here that might be helpful.
There was a problem hiding this comment.
Looks like you can bypass this though and go directly to the underlying createKnowledgeBaseEntry() implementation and pass a dummy user object.
I think this should work, as the IndexEntry is marked as global and shouldn't be constrained by any user filters, but I'm hesitant as we don't do this anywhere else
I tried this approach but the pattern of using a dummy user felt quite odd, so I reverted it and left the implementation as is.
WRT 2., we actually do all our index setup during plugin start/setup via the AIAssistantService, so we could technically do this over there once all the assets have been installed. The reason we don't do this for the Security Labs content is that they're DocumentEntries and so have semantic_text field content, so we need to ensure ELSER is deployed and ready beforehand (which happens as part of KB setup).
what do you mean once all the assets have been installed? is it still relevant after the ES work in elastic/elasticsearch#132506 and for elastic/elasticsearch#133171?
I moved the call to ensureIntegrationKnowledgeIndexEntry to be executed first in setupKnowledgeBase so that it's always executed regardless of the ML nodes, ELSER readiness, etc. do you see any issues with that?
on the Fleet side, we don't do any additional checks before we install a package's KB contents (push documents to .integration_knowledge index)
There was a problem hiding this comment.
what do you mean once all the assets have been installed? is it still relevant after the ES work in elastic/elasticsearch#132506 and for elastic/elasticsearch#133171?
By assets I just meant all the index/component templates and such. Was just commenting that we could create the initial IndexEntry record after that is complete. This is independent of those elasticsearch PR's.
I moved the call to ensureIntegrationKnowledgeIndexEntry to be executed first in setupKnowledgeBase so that it's always executed regardless of the ML nodes, ELSER readiness, etc. do you see any issues with that?
This is fine. As far as I understand (need to test/confirm though), document creates with no value for the semantic_text will not result in an inference call and should succeed. Which is not the case for other forms of updates (docs).
We actually just got confirmation this week from Product that we can tie assistant features to inference API availability, so that means the KB setup process is going to go away. We can now always assume we'll have access to an inference endpoint (and so can ingest documents containing semantic_text at any time). That said, based on @sorenlouv's post over here (elastic/elasticsearch#133171 (comment)), we may want to re-work our approach here since this must be queries with the internal esClient which isn't currently being passed through getStructuredToolForIndexEntry() (so we'd need a special case for this specific IndexEntry).
Let's chat when you have a moment and we can see what Søren has to say on that issue as well.
|
@spong / @elastic/security-generative-ai I would appreciate any early feedback on this PR since my approach comes from a very elementary understanding of the AI assistant plugins :) I left a self-review to highlight areas of potential concerns and questions that I have. I am happy to change anything and everything! |
.integration_knowledge.integration_knowledge
|
Pinging @elastic/fleet (Team:Fleet) |
💚 Build Succeeded
Metrics [docs]
cc @jen-huang |
Summary
Resolves https://github.com/elastic/ingest-dev/issues/5679.
This PR makes it so that a knowledge index entry for the new index
.integration_knowledgealways exists whenever aAIAssistantKnowledgeBaseDataClientis created.This index stores information about Fleet-installed integrations. It is created by ES (see elastic/elasticsearch#132506) and used by Fleet during package installation (see #230107 - not yet merged!).
Release note
If Fleet-installed integration contains knowledge base content, Security AI assistant now reads from this content for context about integrations.
Checklist
release_note:*label is applied per the guidelinesbackport:*labels.