-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings #231376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Since |
|
Thanks for the fix!! Changes LGTM Below are local testing results Testing setup
Results1. Who are the authors of the GTR 2024?🟡 Assistant was not able to find authors of the document.
First of all, I remember this to be working well some versions ago and assistant was able to list all the contributors. It looks like something has changed in the process of converting user input into a meaningful tool input to the KB tool calling. Right now (in case of GPT-4.1) the question above is being converted into a When I tried to search within the index using the original question, I was able to see authors as part of the highlighted fragments: Semantic search
GET elastic-global-threat-report-2024/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "Who are the authors of the GTR 2024?"
}
}
]
}
},
"highlight": {
"fields": {
"content": {
"number_of_fragments": 2,
"order": "score"
}
}
}
}Output
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 8.70585,
"hits": [
{
"_index": "elastic-global-threat-report-2024",
"_id": "u5x4nZgB7B6Is1yyNd5u",
"_score": 8.70585,
"_ignored": [
"attachment.content.keyword"
],
"_source": {
"attachment": {
"date": "2024-09-30T21:51:48Z",
"content_type": "application/pdf",
"format": "application/pdf; version=1.3",
"modified": "2024-10-01T09:38:32Z",
"language": "en",
"metadata_date": "2024-10-01T09:38:32Z",
"creator_tool": "Adobe InDesign 19.5 (Windows)",
"content": """..."""
},
"highlight": {
"content": [
"""2024
Global
Threat�
Report
Table of Contents
Introduction
Generative AI
Threat overview
Augmenting defenders
Malware Detections
Distribution by operation system
Malware categories
Endpoint Behaviors
Distribution by operating system
Distribution by tactic
Cloud Security
Distribution by cloud service provider
Benchmarking cloud security posture
Threat Profiles
REF5961 — BLOODALCHEMY, RUDEBIRD, EAGERBEE, DOWNTOWN
REF8207 — GHOSTPULSE
REF4578 — GHOSTENGINE
REF7001 — KANDYKORN
REF6127 — WARMCOOKIE
Responding to 2023 Forecasts
Forecasts and Recommendations
Conclusion
03
04
04
05
06
06
07
11
11
12
36
37
47
57
58
61
64
66
69
72
75
79
1
2
3
4
5
6
7
8
9
2024 Elastic Global Threat Report
Introduction1
2024 Elastic Global Threat Report
03
Introduction
With the best technologies, the most
widespread information distribution, and the
greatest public awareness of threats all in
motion, the security environment is stronger
than ever. Yet, almost in spite of these things,
threat ecosystems are thriving like never before.
Truthfully, the threat landscape is dynamic
and reactive — a new technique empowers
a previously unknown threat group, vendors
swarm to mitigate that threat and create new
technologies in the process, operators on both
""",
"""https://www.elastic.co/security
https://www.elastic.co/security-labs
https://x.com/elasticseclabs
Conclusion9
2024 Elastic Global Threat Report
80
The 2024 Elastic Global Threat Report features insights and expertise from across the
Elastic organization. We’d like to thank the following Elasticians for their contributions:
Mika Ayenson
Samir Bousseaden
Terrance DeJesus
Chris Donaher
Tinsae Erkailo
Ayoub Faouzi
Eric Forte
Ruben Groenewoud
Justin Ibarra
Devon Kerr
Jake King
Shashank Suryanarayana
Mark Mager
Asuka Nakajima
Andrew Pease
John Uhlmann
Alyssa VanNice
Colson Wilhoit
© 2024. Elasticsearch B.V. All Rights Reserved.
Elastic, Elasticsearch and other related marks are trademarks, logos or registered trademarks of Elasticsearch B.V. in the United States
and other countries. Microsoft, Azure, Windows and other related marks are trademarks of the Microsoft group of companies. Amazon
Web Services, AWS, and other related marks are trademarks of Amazon.com, Inc. or its affiliates. All other brand names, product
names, or trademarks belong to their respective owners.
2024
Global
Threat�
Report
TOC
Introduction
Generative AI
Threat overview
Augmenting defenders
Malware Detections
Malware categories
Endpoint Behaviors
Distribution by tactic
Cloud Security
Distribution by cloud service provider
Benchmarking cloud security posture
Threat Profiles
REF5961
REF8207
REF4578
REF7001
REF6127
Responding to 2023 Forecasts
Forecasts and Recommendations
Conclusion"""
]
}
}
]
}
}This issue is not related to these changes, but wanted to highlight it. We might want to have a look into this and see how we can improve the experience. 2. What is the forecast for the coming year in GTR 2024?✅ Great answer based on KB index entry
3. What are top 10 Process Injection by rules in Windows endpoints in GTR 2024?✅ Great answer based on KB index entry
4. What is the most widely adopted cloud service provider this year according to GTR 2024?✅ Great answer based on KB index entry
5. Give a brief conclusion of the GTR 2024✅ Great answer based on KB index entry
|
florent-leborgne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc link LGTM 🔗
e40pud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all changes!! LGTM 🚀
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Page load bundle
History
cc @spong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty @e40pud! 👋
Vibed with gemini-cli and here's a nice little node script for generating a buncha indices/mappings. It generates indices with names starting with a-z (for testing sorting/filtering), then generates mappings using all the different field types. Save the below to a file:
populate_es.js
#!/usr/bin/env node
const http = require('http');
const https = require('https');
const readline = require('readline');
function parseArgs() {
const args = process.argv.slice(2).reduce((acc, arg, i, arr) => {
if (arg.startsWith('--')) {
const key = arg.slice(2);
const next = arr[i + 1];
if (next && !next.startsWith('--')) {
acc[key] = next;
} else {
acc[key] = true;
}
}
return acc;
}, {});
if (args.help || args.h) {
console.log(`
Elasticsearch Index/Mapping Populator and Cleanup Script
Usage:
node populate_es.js [options]
node populate_es.js --cleanup
node populate_es.js --delete-by-count <number>
Description:
This script stress-tests an Elasticsearch instance by creating a large number
of indices with many fields. It can also clean up the indices it creates.
Creation Options:
--host <url> Elasticsearch host URL (default: http://localhost:9200)
--user <username> Username for basic auth (default: elastic)
--pass <password> Password for basic auth (default: changeme)
--apiKey <key> API key for authentication (overrides user/pass)
--indices <number> Number of indices to create (default: 5000)
--mappings <number> Number of mappings per index (default: 5000)
--maxFields <number> The max number of fields per index (default: same as --mappings)
--shards <number> Number of primary shards per index (default: 1)
--replicas <number> Number of replicas per index (default: 0)
Cleanup & Recovery Options:
--cleanup Delete all indices created by this script.
--delete-by-count <N> Delete the <N> newest stress-test indices.
--yes Bypass confirmation prompt during cleanup.
Other Options:
-h, --help Show this help message
`);
process.exit(0);
}
return {
host: args.host || 'http://localhost:9200',
user: args.user || 'elastic',
pass: args.pass || 'changeme',
apiKey: args.apiKey,
indices: parseInt(args.indices, 10) || 5000,
mappings: parseInt(args.mappings, 10) || 5000,
maxFields: parseInt(args.maxFields, 10) || parseInt(args.mappings, 10) || 5000,
shards: parseInt(args.shards, 10) || 1,
replicas: parseInt(args.replicas, 10) || 0,
cleanup: !!args.cleanup,
deleteByCount: parseInt(args['delete-by-count'], 10) || 0,
yes: !!args.yes,
};
}
const config = parseArgs();
const simpleFieldTypes = [
{ type: 'text' },
{ type: 'keyword' },
{ type: 'long' },
{ type: 'integer' },
{ type: 'short' },
{ type: 'byte' },
{ type: 'double' },
{ type: 'float' },
{ type: 'half_float' },
{ type: 'scaled_float', scaling_factor: 100 },
{ type: 'date' },
{ type: 'date_nanos' },
{ type: 'boolean' },
{ type: 'binary' },
{ type: 'geo_point' },
{ type: 'ip' },
{ type: 'completion' },
{ type: 'token_count', analyzer: 'standard' },
];
const complexFieldTypes = [
{ type: 'integer_range' },
{ type: 'float_range' },
{ type: 'long_range' },
{ type: 'double_range' },
{ type: 'date_range' },
{ type: 'geo_shape' },
{ type: 'search_as_you_type' },
{ type: 'dense_vector', dims: 4 },
{ type: 'semantic_text' },
];
function generateIndexBody(numMappings, maxFields, numShards, numReplicas) {
const properties = {};
let fieldCount = 0;
for (const fieldType of complexFieldTypes) {
if (fieldCount >= numMappings) break;
properties[`complex_${fieldType.type}_${fieldCount}`] = { ...fieldType };
fieldCount++;
}
while (fieldCount < numMappings) {
const fieldTypeDefinition = simpleFieldTypes[fieldCount % simpleFieldTypes.length];
properties[`field_${fieldCount}`] = { ...fieldTypeDefinition };
fieldCount++;
}
return {
settings: {
'index.mapping.total_fields.limit': maxFields,
'index.number_of_shards': numShards,
'index.number_of_replicas': numReplicas,
},
mappings: { properties },
};
}
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
async function makeRequest(method, path, body, retries = 3, delay = 1000) {
for (let i = 0; i < retries; i++) {
try {
return await new Promise((resolve, reject) => {
const url = new URL(config.host);
const protocol = url.protocol === 'https:' ? https : http;
const options = {
hostname: url.hostname,
port: url.port,
path,
method,
headers: { 'Content-Type': 'application/json' },
};
if (config.apiKey) {
options.headers.Authorization = `ApiKey ${config.apiKey}`;
} else if (config.user && config.pass) {
const auth = 'Basic ' + Buffer.from(config.user + ':' + config.pass).toString('base64');
options.headers.Authorization = auth;
}
const req = protocol.request(options, (res) => {
let data = '';
res.on('data', (chunk) => (data += chunk));
res.on('end', () => {
if (res.statusCode >= 200 && res.statusCode < 300) {
try {
resolve({ statusCode: res.statusCode, body: JSON.parse(data || '{}') });
} catch (e) {
reject(new Error('Failed to parse JSON response.'));
}
} else {
const err = new Error(`Request failed with status code ${res.statusCode}: ${data}`);
if (data.includes('resource_already_exists_exception')) {
err.isAlreadyExists = true;
}
if ([429, 503, 504].includes(res.statusCode)) {
err.isRetryable = true;
}
reject(err);
}
});
});
req.on('error', (e) => reject(e));
if (body) req.write(JSON.stringify(body));
req.end();
});
} catch (error) {
if (error.isAlreadyExists || !error.isRetryable || i === retries - 1) {
throw error;
}
await sleep(delay);
delay *= 2;
}
}
}
async function cleanupIndices() {
console.log('Starting cleanup of stress-test indices...');
if (!config.yes) {
const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
await new Promise((resolve) => {
rl.question(
'Are you sure you want to delete all indices with the pattern "*-stress-test-index-*"? (y/N) ',
(answer) => {
if (answer.toLowerCase() !== 'y') {
console.log('Cleanup cancelled.');
process.exit(0);
}
rl.close();
resolve();
}
);
});
}
try {
const { body } = await makeRequest('DELETE', '/*-stress-test-index-*');
console.log('Cleanup successful:', body);
} catch (error) {
if (error.message.includes('404')) {
console.log('No stress-test indices found to delete.');
} else {
console.error('An error occurred during cleanup:', error.message);
process.exit(1);
}
}
}
async function deleteIndicesByCount(count) {
console.log(`Fetching the ${count} newest stress-test indices to delete...`);
try {
const { body } = await makeRequest(
'GET',
`/_cat/indices/*-stress-test-index-*?h=index&s=creation.date:desc&format=json`
);
const indices = body.map((item) => item.index);
if (indices.length === 0) {
console.log('No stress-test indices found to delete.');
return;
}
const batchToDelete = indices.slice(0, count);
console.log(`Deleting ${batchToDelete.length} indices: ${batchToDelete.join(', ')}`);
await makeRequest('DELETE', `/${batchToDelete.join(',')}`);
console.log('Deletion successful.');
} catch (e) {
console.error('\n[FATAL] Could not get or delete indices:', e.message);
process.exit(1);
}
}
async function createIndices() {
console.log('Starting to populate Elasticsearch...');
console.log('Configuration:', {
...config,
pass: '***',
apiKey: config.apiKey ? '***' : undefined,
});
const alphabet = 'abcdefghijklmnopqrstuvwxyz';
let createdCount = 0;
let skippedCount = 0;
const total = config.indices;
const barWidth = 40;
for (let i = 0; i < total; i++) {
const indexName = `${alphabet[i % alphabet.length]}-stress-test-index-${String(i).padStart(
5,
'0'
)}`;
const percent = (i + 1) / total;
const filledWidth = Math.round(barWidth * percent);
const bar = `[${'█'.repeat(filledWidth)}${'-'.repeat(barWidth - filledWidth)}]`;
const percentStr = `${(percent * 100).toFixed(1)}%`;
readline.clearLine(process.stdout, 0);
readline.cursorTo(process.stdout, 0);
process.stdout.write(`${bar} ${percentStr} | [${i + 1}/${total}] Processing: ${indexName}`);
const indexBody = generateIndexBody(
config.mappings,
config.maxFields,
config.shards,
config.replicas
);
try {
await makeRequest('PUT', `/${indexName}`, indexBody);
createdCount++;
} catch (error) {
if (error.isAlreadyExists) {
skippedCount++;
continue;
}
process.stdout.write('\n');
console.error(`\n[FATAL] Failed while processing index ${indexName}:`, error.message);
console.error(
'Exiting due to a critical error. Please check your Elasticsearch cluster status and settings.'
);
process.exit(1);
}
}
process.stdout.write('\n');
console.log(
`\nPopulation complete. Created: ${createdCount}, Skipped: ${skippedCount}, Total processed: ${
createdCount + skippedCount
}.`
);
}
async function main() {
if (config.cleanup) {
await cleanupIndices();
} else if (config.deleteByCount > 0) {
await deleteIndicesByCount(config.deleteByCount);
} else {
await createIndices();
}
}
main().catch((err) => {
console.error('\nAn unexpected error occurred:', err.message);
process.exit(1);
});and for local dev call ala:
node populate_es.js --indices 4000 --mappings 4000
or for cloud clusters:
node populate_es.js --host https://kibana-pr-231376.es.us-west2.gcp.elastic-cloud.com/ --apiKey asdf== --indices 4000 --mappings 4000
and to cleanup all the garbage it made:
node populate_es.js --cleanup --yes
and since I didn't add circuit breaker detection, if you go too far and ES/Kibana won't start, use this to start deleting chunks of indices till the system is healthy again:
node populate_es.js --delete-by-count 20
I tested this both locally and with the ci:cloud-deploy instance linked and all was well! 🎉
Index suggestions worked without issue, and field fetching continued to work as well (even saw those getting cached in the network panel, which is nice :).
When I pushed it to the cluster limits I was seeing issues everywhere else before I could even make it to the KB UI, so I think we're good here! 😅
|
Starting backport for target branches: 9.1 https://github.com/elastic/kibana/actions/runs/16950887524 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… Base Index Entries in deployments with a large number of indices/mappings (elastic#231376) ## Summary This PR fixes an issue with the Security Assistant KB Index Entries interface introduced in `9.1` where a large number of indices/mappings could result in Kibana crashing and preventing the creation of the Index Entry. This is technically a 'fix-hancement' as we are bypassing the underlying issue altogether by switching to use common core API's for index/field suggestions (just as is done in the Discover 'Create a data view' interface), and in turn supporting all fields of type `text`, not just `semantic_text` (elastic#230863). ### Original Issue The underlying issue here was introduced by a change to the `field_caps` API in `9.1` (elastic/elasticsearch#127664) that resulted in the `/internal/elastic_assistant/knowledge_base/_indices` route not finding any indices with a `semantic_text` field, and thus inadvertently falling back to doing a full scan of all mappings ([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)). A fix to this API was initially investigated, but there was no reasonable API available for fetching all occurrences of `semantic_text` fields, so with `match` queries adding support for `semantic_text` in `8.18`, it was decided to go ahead and enable support for all `text` fields. ### Fix Details The `Index` input field now uses the `dataViews.getIndices()` API for suggestions (instead of the `useKnowledgeBaseIndices` hook/route), which is backed by the [resolve indices ES API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index). System indices are filtered out with the `*,-.*` filter. The initial call will return all indices just as the Discover 'Create a data view' interface, which is further filtered upon as the user continues to type. Note: Discover makes subsequent calls upon further user input, though I'm not entirely sure this is necessary here as all indices are initially returned and available for client-side filtering within the input. I will perform further stress testing with many indices/mappings to confirm. The `Field` input now uses the fields already queried for the `Output fields` input suggestions (via `dataViews.getFieldsForWildcard()`), just filtered to those whose `field.esTypes?.includes('text')`. This is potentially still a hot path for client-side code with many mappings, so I will also confirm this with further stress testing. ### Docs @elastic/security-docs / @benironside, we will need to update the Security Assistant KB docs [here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index) to indicate that any `text` field is now supported for retrieval. ### Testing #### Functional Testing To confirm proper retrieval of both `text` and `semantic_text` fields we'll need to create an index, add a document, then create the KB Index Entry. Then update the KB Index Entry to reference the other field type, and test again. <details><summary>Create Index</summary> <p> ``` JSON PUT project-details { "mappings": { "properties": { "project_issue": { "type": "text" }, "project_name": { "type": "text" }, "summary": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch", "model_settings": { "service": "elasticsearch", "task_type": "sparse_embedding" } } } } } ``` </p> </details> <details><summary>Create Sample Doc</summary> <p> ``` JSON PUT project-details/_doc/doc1 { "project_issue": "The main issue at hand is the breaking of the space plane", "project_name": "Issue 5", "summary": "This is a summary that contains the word yellow" } ``` </p> </details> <details><summary>Create KB Index Entry (Text)</summary> <p> ``` JSON POST kbn:api/security_ai_assistant/knowledge_base/entries { "type": "index", "name": "Project Details Tool", "index": "project-details", "field": "project_issue", "outputFields": [], "description": "Use this index to answer questions about any project details.", "queryDescription": "Key terms to search for from the user's prompt." } ``` </p> </details> Now you can open the Assistant and perform a query like: ``` Do I have any project details about the issue at hand? ``` Which should result in a [trace like this](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r) which calls the generated tool that will then perform a _lexical search_ against the configured index. Ensure citations work as expected for the returned document. Now open the Index Entry in the KB Settings UI and change the `field` to from `project_issue` to `summary` for testing `semantic_text`. Open the Assistant and perform a query like: ``` Do I have any project details for Project Yellow? ``` Which should result in a [trace like this](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r) which calls the generated tool that will then perform a _semantic search_ against the configured index. Ensure citations work as expected for the returned document. #### Performance Testing⚠️ In progress -- I will include a script for generating many indices/mappings for testing, and also prepare the `ci-cloud-deploy` instance with the same setup for confirmation. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [X] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials * Will coordinate with @elastic/security-docs on the docs update here. - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 4496b50) # Conflicts: # x-pack/platform/plugins/private/translations/translations/de-DE.json
…wledge Base Index Entries in deployments with a large number of indices/mappings (#231376) (#231717) # Backport This will backport the following commits from `main` to `9.1`: - [[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings (#231376)](#231376) <!--- Backport version: 10.0.1 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Garrett Spong","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-08-13T22:33:31Z","message":"[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings (#231376)\n\n## Summary\n\nThis PR fixes an issue with the Security Assistant KB Index Entries\ninterface introduced in `9.1` where a large number of indices/mappings\ncould result in Kibana crashing and preventing the creation of the Index\nEntry.\n\nThis is technically a 'fix-hancement' as we are bypassing the underlying\nissue altogether by switching to use common core API's for index/field\nsuggestions (just as is done in the Discover 'Create a data view'\ninterface), and in turn supporting all fields of type `text`, not just\n`semantic_text` (https://github.com/elastic/kibana/issues/230863).\n\n### Original Issue\nThe underlying issue here was introduced by a change to the `field_caps`\nAPI in `9.1` (elastic/elasticsearch#127664) that\nresulted in the `/internal/elastic_assistant/knowledge_base/_indices`\nroute not finding any indices with a `semantic_text` field, and thus\ninadvertently falling back to doing a full scan of all mappings\n([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)).\nA fix to this API was initially investigated, but there was no\nreasonable API available for fetching all occurrences of `semantic_text`\nfields, so with `match` queries adding support for `semantic_text` in\n`8.18`, it was decided to go ahead and enable support for all `text`\nfields.\n\n\n### Fix Details\n\nThe `Index` input field now uses the `dataViews.getIndices()` API for\nsuggestions (instead of the `useKnowledgeBaseIndices` hook/route), which\nis backed by the [resolve indices ES\nAPI](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index).\nSystem indices are filtered out with the `*,-.*` filter. The initial\ncall will return all indices just as the Discover 'Create a data view'\ninterface, which is further filtered upon as the user continues to type.\nNote: Discover makes subsequent calls upon further user input, though\nI'm not entirely sure this is necessary here as all indices are\ninitially returned and available for client-side filtering within the\ninput. I will perform further stress testing with many indices/mappings\nto confirm.\n\nThe `Field` input now uses the fields already queried for the `Output\nfields` input suggestions (via `dataViews.getFieldsForWildcard()`), just\nfiltered to those whose `field.esTypes?.includes('text')`. This is\npotentially still a hot path for client-side code with many mappings, so\nI will also confirm this with further stress testing.\n\n\n### Docs\n\n@elastic/security-docs / @benironside, we will need to update the\nSecurity Assistant KB docs\n[here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index)\nto indicate that any `text` field is now supported for retrieval.\n\n\n### Testing\n\n#### Functional Testing\n\nTo confirm proper retrieval of both `text` and `semantic_text` fields\nwe'll need to create an index, add a document, then create the KB Index\nEntry. Then update the KB Index Entry to reference the other field type,\nand test again.\n\n<details><summary>Create Index</summary>\n<p>\n\n``` JSON\nPUT project-details\n{\n \"mappings\": {\n \"properties\": {\n \"project_issue\": {\n \"type\": \"text\"\n },\n \"project_name\": {\n \"type\": \"text\"\n },\n \"summary\": {\n \"type\": \"semantic_text\",\n \"inference_id\": \".elser-2-elasticsearch\",\n \"model_settings\": {\n \"service\": \"elasticsearch\",\n \"task_type\": \"sparse_embedding\"\n }\n }\n }\n }\n}\n```\n</p>\n</details> \n\n<details><summary>Create Sample Doc</summary>\n<p>\n\n``` JSON\nPUT project-details/_doc/doc1\n{\n \"project_issue\": \"The main issue at hand is the breaking of the space plane\",\n \"project_name\": \"Issue 5\",\n \"summary\": \"This is a summary that contains the word yellow\"\n}\n```\n</p>\n</details> \n\n<details><summary>Create KB Index Entry (Text)</summary>\n<p>\n\n``` JSON\nPOST kbn:api/security_ai_assistant/knowledge_base/entries\n{\n \"type\": \"index\",\n \"name\": \"Project Details Tool\",\n \"index\": \"project-details\",\n \"field\": \"project_issue\",\n \"outputFields\": [],\n \"description\": \"Use this index to answer questions about any project details.\",\n \"queryDescription\": \"Key terms to search for from the user's prompt.\"\n}\n```\n</p>\n</details> \n\nNow you can open the Assistant and perform a query like:\n```\nDo I have any project details about the issue at hand?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r)\nwhich calls the generated tool that will then perform a _lexical search_\nagainst the configured index. Ensure citations work as expected for the\nreturned document.\n\nNow open the Index Entry in the KB Settings UI and change the `field` to\nfrom `project_issue` to `summary` for testing `semantic_text`.\n\nOpen the Assistant and perform a query like:\n```\nDo I have any project details for Project Yellow?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r)\nwhich calls the generated tool that will then perform a _semantic\nsearch_ against the configured index. Ensure citations work as expected\nfor the returned document.\n\n\n\n\n\n\n\n#### Performance Testing\n\n⚠️ In progress -- I will include a script for generating many\nindices/mappings for testing, and also prepare the `ci-cloud-deploy`\ninstance with the same setup for confirmation.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [X] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n * Will coordinate with @elastic/security-docs on the docs update here.\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"4496b50237b5d6ff8acb105662912268717bf3f6","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix","needs_docs","sdh-linked","ci:cloud-deploy","Team:Security Generative AI","backport:version","v9.2.0","v9.1.3"],"title":"[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings","number":231376,"url":"https://github.com/elastic/kibana/pull/231376","mergeCommit":{"message":"[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings (#231376)\n\n## Summary\n\nThis PR fixes an issue with the Security Assistant KB Index Entries\ninterface introduced in `9.1` where a large number of indices/mappings\ncould result in Kibana crashing and preventing the creation of the Index\nEntry.\n\nThis is technically a 'fix-hancement' as we are bypassing the underlying\nissue altogether by switching to use common core API's for index/field\nsuggestions (just as is done in the Discover 'Create a data view'\ninterface), and in turn supporting all fields of type `text`, not just\n`semantic_text` (https://github.com/elastic/kibana/issues/230863).\n\n### Original Issue\nThe underlying issue here was introduced by a change to the `field_caps`\nAPI in `9.1` (elastic/elasticsearch#127664) that\nresulted in the `/internal/elastic_assistant/knowledge_base/_indices`\nroute not finding any indices with a `semantic_text` field, and thus\ninadvertently falling back to doing a full scan of all mappings\n([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)).\nA fix to this API was initially investigated, but there was no\nreasonable API available for fetching all occurrences of `semantic_text`\nfields, so with `match` queries adding support for `semantic_text` in\n`8.18`, it was decided to go ahead and enable support for all `text`\nfields.\n\n\n### Fix Details\n\nThe `Index` input field now uses the `dataViews.getIndices()` API for\nsuggestions (instead of the `useKnowledgeBaseIndices` hook/route), which\nis backed by the [resolve indices ES\nAPI](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index).\nSystem indices are filtered out with the `*,-.*` filter. The initial\ncall will return all indices just as the Discover 'Create a data view'\ninterface, which is further filtered upon as the user continues to type.\nNote: Discover makes subsequent calls upon further user input, though\nI'm not entirely sure this is necessary here as all indices are\ninitially returned and available for client-side filtering within the\ninput. I will perform further stress testing with many indices/mappings\nto confirm.\n\nThe `Field` input now uses the fields already queried for the `Output\nfields` input suggestions (via `dataViews.getFieldsForWildcard()`), just\nfiltered to those whose `field.esTypes?.includes('text')`. This is\npotentially still a hot path for client-side code with many mappings, so\nI will also confirm this with further stress testing.\n\n\n### Docs\n\n@elastic/security-docs / @benironside, we will need to update the\nSecurity Assistant KB docs\n[here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index)\nto indicate that any `text` field is now supported for retrieval.\n\n\n### Testing\n\n#### Functional Testing\n\nTo confirm proper retrieval of both `text` and `semantic_text` fields\nwe'll need to create an index, add a document, then create the KB Index\nEntry. Then update the KB Index Entry to reference the other field type,\nand test again.\n\n<details><summary>Create Index</summary>\n<p>\n\n``` JSON\nPUT project-details\n{\n \"mappings\": {\n \"properties\": {\n \"project_issue\": {\n \"type\": \"text\"\n },\n \"project_name\": {\n \"type\": \"text\"\n },\n \"summary\": {\n \"type\": \"semantic_text\",\n \"inference_id\": \".elser-2-elasticsearch\",\n \"model_settings\": {\n \"service\": \"elasticsearch\",\n \"task_type\": \"sparse_embedding\"\n }\n }\n }\n }\n}\n```\n</p>\n</details> \n\n<details><summary>Create Sample Doc</summary>\n<p>\n\n``` JSON\nPUT project-details/_doc/doc1\n{\n \"project_issue\": \"The main issue at hand is the breaking of the space plane\",\n \"project_name\": \"Issue 5\",\n \"summary\": \"This is a summary that contains the word yellow\"\n}\n```\n</p>\n</details> \n\n<details><summary>Create KB Index Entry (Text)</summary>\n<p>\n\n``` JSON\nPOST kbn:api/security_ai_assistant/knowledge_base/entries\n{\n \"type\": \"index\",\n \"name\": \"Project Details Tool\",\n \"index\": \"project-details\",\n \"field\": \"project_issue\",\n \"outputFields\": [],\n \"description\": \"Use this index to answer questions about any project details.\",\n \"queryDescription\": \"Key terms to search for from the user's prompt.\"\n}\n```\n</p>\n</details> \n\nNow you can open the Assistant and perform a query like:\n```\nDo I have any project details about the issue at hand?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r)\nwhich calls the generated tool that will then perform a _lexical search_\nagainst the configured index. Ensure citations work as expected for the\nreturned document.\n\nNow open the Index Entry in the KB Settings UI and change the `field` to\nfrom `project_issue` to `summary` for testing `semantic_text`.\n\nOpen the Assistant and perform a query like:\n```\nDo I have any project details for Project Yellow?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r)\nwhich calls the generated tool that will then perform a _semantic\nsearch_ against the configured index. Ensure citations work as expected\nfor the returned document.\n\n\n\n\n\n\n\n#### Performance Testing\n\n⚠️ In progress -- I will include a script for generating many\nindices/mappings for testing, and also prepare the `ci-cloud-deploy`\ninstance with the same setup for confirmation.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [X] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n * Will coordinate with @elastic/security-docs on the docs update here.\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"4496b50237b5d6ff8acb105662912268717bf3f6"}},"sourceBranch":"main","suggestedTargetBranches":["9.1"],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/231376","number":231376,"mergeCommit":{"message":"[Security Assistant] Fixes issue preventing the creation of Knowledge Base Index Entries in deployments with a large number of indices/mappings (#231376)\n\n## Summary\n\nThis PR fixes an issue with the Security Assistant KB Index Entries\ninterface introduced in `9.1` where a large number of indices/mappings\ncould result in Kibana crashing and preventing the creation of the Index\nEntry.\n\nThis is technically a 'fix-hancement' as we are bypassing the underlying\nissue altogether by switching to use common core API's for index/field\nsuggestions (just as is done in the Discover 'Create a data view'\ninterface), and in turn supporting all fields of type `text`, not just\n`semantic_text` (https://github.com/elastic/kibana/issues/230863).\n\n### Original Issue\nThe underlying issue here was introduced by a change to the `field_caps`\nAPI in `9.1` (elastic/elasticsearch#127664) that\nresulted in the `/internal/elastic_assistant/knowledge_base/_indices`\nroute not finding any indices with a `semantic_text` field, and thus\ninadvertently falling back to doing a full scan of all mappings\n([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)).\nA fix to this API was initially investigated, but there was no\nreasonable API available for fetching all occurrences of `semantic_text`\nfields, so with `match` queries adding support for `semantic_text` in\n`8.18`, it was decided to go ahead and enable support for all `text`\nfields.\n\n\n### Fix Details\n\nThe `Index` input field now uses the `dataViews.getIndices()` API for\nsuggestions (instead of the `useKnowledgeBaseIndices` hook/route), which\nis backed by the [resolve indices ES\nAPI](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index).\nSystem indices are filtered out with the `*,-.*` filter. The initial\ncall will return all indices just as the Discover 'Create a data view'\ninterface, which is further filtered upon as the user continues to type.\nNote: Discover makes subsequent calls upon further user input, though\nI'm not entirely sure this is necessary here as all indices are\ninitially returned and available for client-side filtering within the\ninput. I will perform further stress testing with many indices/mappings\nto confirm.\n\nThe `Field` input now uses the fields already queried for the `Output\nfields` input suggestions (via `dataViews.getFieldsForWildcard()`), just\nfiltered to those whose `field.esTypes?.includes('text')`. This is\npotentially still a hot path for client-side code with many mappings, so\nI will also confirm this with further stress testing.\n\n\n### Docs\n\n@elastic/security-docs / @benironside, we will need to update the\nSecurity Assistant KB docs\n[here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index)\nto indicate that any `text` field is now supported for retrieval.\n\n\n### Testing\n\n#### Functional Testing\n\nTo confirm proper retrieval of both `text` and `semantic_text` fields\nwe'll need to create an index, add a document, then create the KB Index\nEntry. Then update the KB Index Entry to reference the other field type,\nand test again.\n\n<details><summary>Create Index</summary>\n<p>\n\n``` JSON\nPUT project-details\n{\n \"mappings\": {\n \"properties\": {\n \"project_issue\": {\n \"type\": \"text\"\n },\n \"project_name\": {\n \"type\": \"text\"\n },\n \"summary\": {\n \"type\": \"semantic_text\",\n \"inference_id\": \".elser-2-elasticsearch\",\n \"model_settings\": {\n \"service\": \"elasticsearch\",\n \"task_type\": \"sparse_embedding\"\n }\n }\n }\n }\n}\n```\n</p>\n</details> \n\n<details><summary>Create Sample Doc</summary>\n<p>\n\n``` JSON\nPUT project-details/_doc/doc1\n{\n \"project_issue\": \"The main issue at hand is the breaking of the space plane\",\n \"project_name\": \"Issue 5\",\n \"summary\": \"This is a summary that contains the word yellow\"\n}\n```\n</p>\n</details> \n\n<details><summary>Create KB Index Entry (Text)</summary>\n<p>\n\n``` JSON\nPOST kbn:api/security_ai_assistant/knowledge_base/entries\n{\n \"type\": \"index\",\n \"name\": \"Project Details Tool\",\n \"index\": \"project-details\",\n \"field\": \"project_issue\",\n \"outputFields\": [],\n \"description\": \"Use this index to answer questions about any project details.\",\n \"queryDescription\": \"Key terms to search for from the user's prompt.\"\n}\n```\n</p>\n</details> \n\nNow you can open the Assistant and perform a query like:\n```\nDo I have any project details about the issue at hand?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r)\nwhich calls the generated tool that will then perform a _lexical search_\nagainst the configured index. Ensure citations work as expected for the\nreturned document.\n\nNow open the Index Entry in the KB Settings UI and change the `field` to\nfrom `project_issue` to `summary` for testing `semantic_text`.\n\nOpen the Assistant and perform a query like:\n```\nDo I have any project details for Project Yellow?\n```\n\nWhich should result in a [trace like\nthis](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r)\nwhich calls the generated tool that will then perform a _semantic\nsearch_ against the configured index. Ensure citations work as expected\nfor the returned document.\n\n\n\n\n\n\n\n#### Performance Testing\n\n⚠️ In progress -- I will include a script for generating many\nindices/mappings for testing, and also prepare the `ci-cloud-deploy`\ninstance with the same setup for confirmation.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [X] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n * Will coordinate with @elastic/security-docs on the docs update here.\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"4496b50237b5d6ff8acb105662912268717bf3f6"}},{"branch":"9.1","label":"v9.1.3","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
… Base Index Entries in deployments with a large number of indices/mappings (elastic#231376) ## Summary This PR fixes an issue with the Security Assistant KB Index Entries interface introduced in `9.1` where a large number of indices/mappings could result in Kibana crashing and preventing the creation of the Index Entry. This is technically a 'fix-hancement' as we are bypassing the underlying issue altogether by switching to use common core API's for index/field suggestions (just as is done in the Discover 'Create a data view' interface), and in turn supporting all fields of type `text`, not just `semantic_text` (elastic#230863). ### Original Issue The underlying issue here was introduced by a change to the `field_caps` API in `9.1` (elastic/elasticsearch#127664) that resulted in the `/internal/elastic_assistant/knowledge_base/_indices` route not finding any indices with a `semantic_text` field, and thus inadvertently falling back to doing a full scan of all mappings ([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)). A fix to this API was initially investigated, but there was no reasonable API available for fetching all occurrences of `semantic_text` fields, so with `match` queries adding support for `semantic_text` in `8.18`, it was decided to go ahead and enable support for all `text` fields. ### Fix Details The `Index` input field now uses the `dataViews.getIndices()` API for suggestions (instead of the `useKnowledgeBaseIndices` hook/route), which is backed by the [resolve indices ES API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index). System indices are filtered out with the `*,-.*` filter. The initial call will return all indices just as the Discover 'Create a data view' interface, which is further filtered upon as the user continues to type. Note: Discover makes subsequent calls upon further user input, though I'm not entirely sure this is necessary here as all indices are initially returned and available for client-side filtering within the input. I will perform further stress testing with many indices/mappings to confirm. The `Field` input now uses the fields already queried for the `Output fields` input suggestions (via `dataViews.getFieldsForWildcard()`), just filtered to those whose `field.esTypes?.includes('text')`. This is potentially still a hot path for client-side code with many mappings, so I will also confirm this with further stress testing. ### Docs @elastic/security-docs / @benironside, we will need to update the Security Assistant KB docs [here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index) to indicate that any `text` field is now supported for retrieval. ### Testing #### Functional Testing To confirm proper retrieval of both `text` and `semantic_text` fields we'll need to create an index, add a document, then create the KB Index Entry. Then update the KB Index Entry to reference the other field type, and test again. <details><summary>Create Index</summary> <p> ``` JSON PUT project-details { "mappings": { "properties": { "project_issue": { "type": "text" }, "project_name": { "type": "text" }, "summary": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch", "model_settings": { "service": "elasticsearch", "task_type": "sparse_embedding" } } } } } ``` </p> </details> <details><summary>Create Sample Doc</summary> <p> ``` JSON PUT project-details/_doc/doc1 { "project_issue": "The main issue at hand is the breaking of the space plane", "project_name": "Issue 5", "summary": "This is a summary that contains the word yellow" } ``` </p> </details> <details><summary>Create KB Index Entry (Text)</summary> <p> ``` JSON POST kbn:api/security_ai_assistant/knowledge_base/entries { "type": "index", "name": "Project Details Tool", "index": "project-details", "field": "project_issue", "outputFields": [], "description": "Use this index to answer questions about any project details.", "queryDescription": "Key terms to search for from the user's prompt." } ``` </p> </details> Now you can open the Assistant and perform a query like: ``` Do I have any project details about the issue at hand? ``` Which should result in a [trace like this](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r) which calls the generated tool that will then perform a _lexical search_ against the configured index. Ensure citations work as expected for the returned document. Now open the Index Entry in the KB Settings UI and change the `field` to from `project_issue` to `summary` for testing `semantic_text`. Open the Assistant and perform a query like: ``` Do I have any project details for Project Yellow? ``` Which should result in a [trace like this](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r) which calls the generated tool that will then perform a _semantic search_ against the configured index. Ensure citations work as expected for the returned document. #### Performance Testing⚠️ In progress -- I will include a script for generating many indices/mappings for testing, and also prepare the `ci-cloud-deploy` instance with the same setup for confirmation. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [X] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials * Will coordinate with @elastic/security-docs on the docs update here. - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
…#231725) ## Summary This is a follow-up to #231376 where I added new `docLinks` for the Security Solution AI Assistant. This PR removes the `TODO` added in that PR and creates a new `securitySolution.aiAssistant` grouping so we can more easily add docLinks as needed. Reviewers Note: Sorry for the extra noise here -- I thought there were more references so decided to do in another PR. Turns out there were not, so this is a quick one!
…elastic#231725) ## Summary This is a follow-up to elastic#231376 where I added new `docLinks` for the Security Solution AI Assistant. This PR removes the `TODO` added in that PR and creates a new `securitySolution.aiAssistant` grouping so we can more easily add docLinks as needed. Reviewers Note: Sorry for the extra noise here -- I thought there were more references so decided to do in another PR. Turns out there were not, so this is a quick one! (cherry picked from commit 90470cf) # Conflicts: # src/platform/plugins/shared/ai_assistant_management/selection/public/routes/components/ai_assistant_selection_page.tsx
… Base Index Entries in deployments with a large number of indices/mappings (elastic#231376) ## Summary This PR fixes an issue with the Security Assistant KB Index Entries interface introduced in `9.1` where a large number of indices/mappings could result in Kibana crashing and preventing the creation of the Index Entry. This is technically a 'fix-hancement' as we are bypassing the underlying issue altogether by switching to use common core API's for index/field suggestions (just as is done in the Discover 'Create a data view' interface), and in turn supporting all fields of type `text`, not just `semantic_text` (elastic#230863). ### Original Issue The underlying issue here was introduced by a change to the `field_caps` API in `9.1` (elastic/elasticsearch#127664) that resulted in the `/internal/elastic_assistant/knowledge_base/_indices` route not finding any indices with a `semantic_text` field, and thus inadvertently falling back to doing a full scan of all mappings ([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)). A fix to this API was initially investigated, but there was no reasonable API available for fetching all occurrences of `semantic_text` fields, so with `match` queries adding support for `semantic_text` in `8.18`, it was decided to go ahead and enable support for all `text` fields. ### Fix Details The `Index` input field now uses the `dataViews.getIndices()` API for suggestions (instead of the `useKnowledgeBaseIndices` hook/route), which is backed by the [resolve indices ES API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index). System indices are filtered out with the `*,-.*` filter. The initial call will return all indices just as the Discover 'Create a data view' interface, which is further filtered upon as the user continues to type. Note: Discover makes subsequent calls upon further user input, though I'm not entirely sure this is necessary here as all indices are initially returned and available for client-side filtering within the input. I will perform further stress testing with many indices/mappings to confirm. The `Field` input now uses the fields already queried for the `Output fields` input suggestions (via `dataViews.getFieldsForWildcard()`), just filtered to those whose `field.esTypes?.includes('text')`. This is potentially still a hot path for client-side code with many mappings, so I will also confirm this with further stress testing. ### Docs @elastic/security-docs / @benironside, we will need to update the Security Assistant KB docs [here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index) to indicate that any `text` field is now supported for retrieval. ### Testing #### Functional Testing To confirm proper retrieval of both `text` and `semantic_text` fields we'll need to create an index, add a document, then create the KB Index Entry. Then update the KB Index Entry to reference the other field type, and test again. <details><summary>Create Index</summary> <p> ``` JSON PUT project-details { "mappings": { "properties": { "project_issue": { "type": "text" }, "project_name": { "type": "text" }, "summary": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch", "model_settings": { "service": "elasticsearch", "task_type": "sparse_embedding" } } } } } ``` </p> </details> <details><summary>Create Sample Doc</summary> <p> ``` JSON PUT project-details/_doc/doc1 { "project_issue": "The main issue at hand is the breaking of the space plane", "project_name": "Issue 5", "summary": "This is a summary that contains the word yellow" } ``` </p> </details> <details><summary>Create KB Index Entry (Text)</summary> <p> ``` JSON POST kbn:api/security_ai_assistant/knowledge_base/entries { "type": "index", "name": "Project Details Tool", "index": "project-details", "field": "project_issue", "outputFields": [], "description": "Use this index to answer questions about any project details.", "queryDescription": "Key terms to search for from the user's prompt." } ``` </p> </details> Now you can open the Assistant and perform a query like: ``` Do I have any project details about the issue at hand? ``` Which should result in a [trace like this](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r) which calls the generated tool that will then perform a _lexical search_ against the configured index. Ensure citations work as expected for the returned document. Now open the Index Entry in the KB Settings UI and change the `field` to from `project_issue` to `summary` for testing `semantic_text`. Open the Assistant and perform a query like: ``` Do I have any project details for Project Yellow? ``` Which should result in a [trace like this](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r) which calls the generated tool that will then perform a _semantic search_ against the configured index. Ensure citations work as expected for the returned document. #### Performance Testing⚠️ In progress -- I will include a script for generating many indices/mappings for testing, and also prepare the `ci-cloud-deploy` instance with the same setup for confirmation. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [X] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials * Will coordinate with @elastic/security-docs on the docs update here. - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
…elastic#231725) ## Summary This is a follow-up to elastic#231376 where I added new `docLinks` for the Security Solution AI Assistant. This PR removes the `TODO` added in that PR and creates a new `securitySolution.aiAssistant` grouping so we can more easily add docLinks as needed. Reviewers Note: Sorry for the extra noise here -- I thought there were more references so decided to do in another PR. Turns out there were not, so this is a quick one!
…tions (#231904) ## Summary Small follow-up improvement to #231376 which added support for `text` fields to Index Entries. This PR adds the field type as a badge in the suggestions so users will know if a semantic or lexical search will be performed (so they can adapt the query instructions accordingly). Note: Needed to update the field API request from `dataViews.getFieldsForWildcard` (which called `/internal/data_views/_fields_for_wildcard`) to use `/api/index_management/mapping/[indexName]` as the former did not have the option to include field type. I confirmed no new privileges were necessary for this API, and the user just needs the same index privileges as before. cc @jamesspi Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b" /> </p> Output Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54" /> </p> --- As part of this PR I've also included the helper script from #231376 for testing these large index/mapping scenarios. This script was almost entirely written in a collab session with `gemini-cli`, and is located in: > x-pack/solutions/security/plugins/elastic_assistant/scripts Options include: ``` bash Elasticsearch Index/Mapping Populator and Cleanup Script Usage: node stress_test_mappings.js [options] node stress_test_mappings.js --cleanup node stress_test_mappings.js --delete-by-count <number> Description: This script stress-tests an Elasticsearch instance by creating a large number of indices with many fields. It can also clean up the indices it creates. Creation Options: --host <url> Elasticsearch host URL (default: http://localhost:9200) --user <username> Username for basic auth (default: elastic) --pass <password> Password for basic auth (default: changeme) --apiKey <key> API key for authentication (overrides user/pass) --indices <number> Number of indices to create (default: 5000) --mappings <number> Number of mappings per index (default: 5000) --maxFields <number> The max number of fields per index (default: same as --mappings) --shards <number> Number of primary shards per index (default: 1) --replicas <number> Number of replicas per index (default: 0) Cleanup & Recovery Options: --cleanup Delete all indices created by this script. --delete-by-count <N> Delete the <N> newest stress-test indices. --yes Bypass confirmation prompt during cleanup. Other Options: -h, --help Show this help message ``` And some test executions are as follows. First CD into the assistant working directory: ``` cd x-pack/solutions/security/plugins/elastic_assistant/ ``` ##### Populate your local ES -- defaults to 5000 indices and 5000 mappings _per_ index. This _will cause_ a default local ES to crash, so stop early (~569), or change configuration :) ``` bash yarn stress-test-mappings ``` ##### If your ES is at its limits, you can slowly dial back the index count with the following: ``` bash yarn stress-test-mappings --delete-by-count 50 --yes ``` ##### Or cleanup all the indices you created entirely with: ``` bash yarn stress-test-mappings --cleanup --yes ``` ##### And for a cloud install, create an API key and populate with the following: ``` bash yarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE ``` > [!IMPORTANT] > This is a quick utility script and may be buggy! Continue to vibe code it as you see fit, but it worked for my needs here for testing and validating this issue and fix 🙂 ### Checklist - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
…tions (elastic#231904) ## Summary Small follow-up improvement to elastic#231376 which added support for `text` fields to Index Entries. This PR adds the field type as a badge in the suggestions so users will know if a semantic or lexical search will be performed (so they can adapt the query instructions accordingly). Note: Needed to update the field API request from `dataViews.getFieldsForWildcard` (which called `/internal/data_views/_fields_for_wildcard`) to use `/api/index_management/mapping/[indexName]` as the former did not have the option to include field type. I confirmed no new privileges were necessary for this API, and the user just needs the same index privileges as before. cc @jamesspi Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b" /> </p> Output Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54" /> </p> --- As part of this PR I've also included the helper script from elastic#231376 for testing these large index/mapping scenarios. This script was almost entirely written in a collab session with `gemini-cli`, and is located in: > x-pack/solutions/security/plugins/elastic_assistant/scripts Options include: ``` bash Elasticsearch Index/Mapping Populator and Cleanup Script Usage: node stress_test_mappings.js [options] node stress_test_mappings.js --cleanup node stress_test_mappings.js --delete-by-count <number> Description: This script stress-tests an Elasticsearch instance by creating a large number of indices with many fields. It can also clean up the indices it creates. Creation Options: --host <url> Elasticsearch host URL (default: http://localhost:9200) --user <username> Username for basic auth (default: elastic) --pass <password> Password for basic auth (default: changeme) --apiKey <key> API key for authentication (overrides user/pass) --indices <number> Number of indices to create (default: 5000) --mappings <number> Number of mappings per index (default: 5000) --maxFields <number> The max number of fields per index (default: same as --mappings) --shards <number> Number of primary shards per index (default: 1) --replicas <number> Number of replicas per index (default: 0) Cleanup & Recovery Options: --cleanup Delete all indices created by this script. --delete-by-count <N> Delete the <N> newest stress-test indices. --yes Bypass confirmation prompt during cleanup. Other Options: -h, --help Show this help message ``` And some test executions are as follows. First CD into the assistant working directory: ``` cd x-pack/solutions/security/plugins/elastic_assistant/ ``` ##### Populate your local ES -- defaults to 5000 indices and 5000 mappings _per_ index. This _will cause_ a default local ES to crash, so stop early (~569), or change configuration :) ``` bash yarn stress-test-mappings ``` ##### If your ES is at its limits, you can slowly dial back the index count with the following: ``` bash yarn stress-test-mappings --delete-by-count 50 --yes ``` ##### Or cleanup all the indices you created entirely with: ``` bash yarn stress-test-mappings --cleanup --yes ``` ##### And for a cloud install, create an API key and populate with the following: ``` bash yarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE ``` > [!IMPORTANT] > This is a quick utility script and may be buggy! Continue to vibe code it as you see fit, but it worked for my needs here for testing and validating this issue and fix 🙂 ### Checklist - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 39a6983)
…suggestions (#231904) (#232674) # Backport This will backport the following commits from `main` to `9.1`: - [[Security Assistant] Add field type badge to Index Entry field suggestions (#231904)](#231904) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Garrett Spong","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-08-22T14:41:57Z","message":"[Security Assistant] Add field type badge to Index Entry field suggestions (#231904)\n\n## Summary\n\nSmall follow-up improvement to\nhttps://github.com//pull/231376 which added support for\n`text` fields to Index Entries. This PR adds the field type as a badge\nin the suggestions so users will know if a semantic or lexical search\nwill be performed (so they can adapt the query instructions\naccordingly).\n\n\nNote: Needed to update the field API request from\n`dataViews.getFieldsForWildcard` (which called\n`/internal/data_views/_fields_for_wildcard`) to use\n`/api/index_management/mapping/[indexName]` as the former did not have\nthe option to include field type. I confirmed no new privileges were\nnecessary for this API, and the user just needs the same index\nprivileges as before.\n\ncc @jamesspi \n\nField Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b\"\n/>\n</p> \n\nOutput Field Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54\"\n/>\n</p> \n\n\n---\n\nAs part of this PR I've also included the helper script from\nhttps://github.com//pull/231376 for testing these large\nindex/mapping scenarios. This script was almost entirely written in a\ncollab session with `gemini-cli`, and is located in:\n\n> x-pack/solutions/security/plugins/elastic_assistant/scripts \n\nOptions include:\n\n``` bash\n Elasticsearch Index/Mapping Populator and Cleanup Script\n\n Usage:\n node stress_test_mappings.js [options]\n node stress_test_mappings.js --cleanup\n node stress_test_mappings.js --delete-by-count <number>\n\n Description:\n This script stress-tests an Elasticsearch instance by creating a large number\n of indices with many fields. It can also clean up the indices it creates.\n\n Creation Options:\n --host <url> Elasticsearch host URL (default: http://localhost:9200)\n --user <username> Username for basic auth (default: elastic)\n --pass <password> Password for basic auth (default: changeme)\n --apiKey <key> API key for authentication (overrides user/pass)\n --indices <number> Number of indices to create (default: 5000)\n --mappings <number> Number of mappings per index (default: 5000)\n --maxFields <number> The max number of fields per index (default: same as --mappings)\n --shards <number> Number of primary shards per index (default: 1)\n --replicas <number> Number of replicas per index (default: 0)\n\n Cleanup & Recovery Options:\n --cleanup Delete all indices created by this script.\n --delete-by-count <N> Delete the <N> newest stress-test indices.\n --yes Bypass confirmation prompt during cleanup.\n\n Other Options:\n -h, --help Show this help message\n```\n\n\nAnd some test executions are as follows. First CD into the assistant\nworking directory:\n\n```\ncd x-pack/solutions/security/plugins/elastic_assistant/\n```\n\n##### Populate your local ES -- defaults to 5000 indices and 5000\nmappings _per_ index. This _will cause_ a default local ES to crash, so\nstop early (~569), or change configuration :)\n``` bash\nyarn stress-test-mappings \n```\n\n##### If your ES is at its limits, you can slowly dial back the index\ncount with the following:\n``` bash\nyarn stress-test-mappings --delete-by-count 50 --yes\n```\n\n##### Or cleanup all the indices you created entirely with:\n``` bash\nyarn stress-test-mappings --cleanup --yes\n```\n\n##### And for a cloud install, create an API key and populate with the\nfollowing:\n``` bash\nyarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE\n```\n\n> [!IMPORTANT]\n> This is a quick utility script and may be buggy! Continue to vibe code\nit as you see fit, but it worked for my needs here for testing and\nvalidating this issue and fix 🙂\n\n\n\n\n### Checklist\n\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"39a6983ded36b572879346bbcfada819156f3e11","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Security Generative AI","backport:version","v9.2.0","v9.1.3"],"title":"[Security Assistant] Add field type badge to Index Entry field suggestions","number":231904,"url":"https://github.com/elastic/kibana/pull/231904","mergeCommit":{"message":"[Security Assistant] Add field type badge to Index Entry field suggestions (#231904)\n\n## Summary\n\nSmall follow-up improvement to\nhttps://github.com//pull/231376 which added support for\n`text` fields to Index Entries. This PR adds the field type as a badge\nin the suggestions so users will know if a semantic or lexical search\nwill be performed (so they can adapt the query instructions\naccordingly).\n\n\nNote: Needed to update the field API request from\n`dataViews.getFieldsForWildcard` (which called\n`/internal/data_views/_fields_for_wildcard`) to use\n`/api/index_management/mapping/[indexName]` as the former did not have\nthe option to include field type. I confirmed no new privileges were\nnecessary for this API, and the user just needs the same index\nprivileges as before.\n\ncc @jamesspi \n\nField Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b\"\n/>\n</p> \n\nOutput Field Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54\"\n/>\n</p> \n\n\n---\n\nAs part of this PR I've also included the helper script from\nhttps://github.com//pull/231376 for testing these large\nindex/mapping scenarios. This script was almost entirely written in a\ncollab session with `gemini-cli`, and is located in:\n\n> x-pack/solutions/security/plugins/elastic_assistant/scripts \n\nOptions include:\n\n``` bash\n Elasticsearch Index/Mapping Populator and Cleanup Script\n\n Usage:\n node stress_test_mappings.js [options]\n node stress_test_mappings.js --cleanup\n node stress_test_mappings.js --delete-by-count <number>\n\n Description:\n This script stress-tests an Elasticsearch instance by creating a large number\n of indices with many fields. It can also clean up the indices it creates.\n\n Creation Options:\n --host <url> Elasticsearch host URL (default: http://localhost:9200)\n --user <username> Username for basic auth (default: elastic)\n --pass <password> Password for basic auth (default: changeme)\n --apiKey <key> API key for authentication (overrides user/pass)\n --indices <number> Number of indices to create (default: 5000)\n --mappings <number> Number of mappings per index (default: 5000)\n --maxFields <number> The max number of fields per index (default: same as --mappings)\n --shards <number> Number of primary shards per index (default: 1)\n --replicas <number> Number of replicas per index (default: 0)\n\n Cleanup & Recovery Options:\n --cleanup Delete all indices created by this script.\n --delete-by-count <N> Delete the <N> newest stress-test indices.\n --yes Bypass confirmation prompt during cleanup.\n\n Other Options:\n -h, --help Show this help message\n```\n\n\nAnd some test executions are as follows. First CD into the assistant\nworking directory:\n\n```\ncd x-pack/solutions/security/plugins/elastic_assistant/\n```\n\n##### Populate your local ES -- defaults to 5000 indices and 5000\nmappings _per_ index. This _will cause_ a default local ES to crash, so\nstop early (~569), or change configuration :)\n``` bash\nyarn stress-test-mappings \n```\n\n##### If your ES is at its limits, you can slowly dial back the index\ncount with the following:\n``` bash\nyarn stress-test-mappings --delete-by-count 50 --yes\n```\n\n##### Or cleanup all the indices you created entirely with:\n``` bash\nyarn stress-test-mappings --cleanup --yes\n```\n\n##### And for a cloud install, create an API key and populate with the\nfollowing:\n``` bash\nyarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE\n```\n\n> [!IMPORTANT]\n> This is a quick utility script and may be buggy! Continue to vibe code\nit as you see fit, but it worked for my needs here for testing and\nvalidating this issue and fix 🙂\n\n\n\n\n### Checklist\n\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"39a6983ded36b572879346bbcfada819156f3e11"}},"sourceBranch":"main","suggestedTargetBranches":["9.1"],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/231904","number":231904,"mergeCommit":{"message":"[Security Assistant] Add field type badge to Index Entry field suggestions (#231904)\n\n## Summary\n\nSmall follow-up improvement to\nhttps://github.com//pull/231376 which added support for\n`text` fields to Index Entries. This PR adds the field type as a badge\nin the suggestions so users will know if a semantic or lexical search\nwill be performed (so they can adapt the query instructions\naccordingly).\n\n\nNote: Needed to update the field API request from\n`dataViews.getFieldsForWildcard` (which called\n`/internal/data_views/_fields_for_wildcard`) to use\n`/api/index_management/mapping/[indexName]` as the former did not have\nthe option to include field type. I confirmed no new privileges were\nnecessary for this API, and the user just needs the same index\nprivileges as before.\n\ncc @jamesspi \n\nField Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b\"\n/>\n</p> \n\nOutput Field Options:\n<p align=\"center\">\n<img width=\"500\"\nsrc=\"https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54\"\n/>\n</p> \n\n\n---\n\nAs part of this PR I've also included the helper script from\nhttps://github.com//pull/231376 for testing these large\nindex/mapping scenarios. This script was almost entirely written in a\ncollab session with `gemini-cli`, and is located in:\n\n> x-pack/solutions/security/plugins/elastic_assistant/scripts \n\nOptions include:\n\n``` bash\n Elasticsearch Index/Mapping Populator and Cleanup Script\n\n Usage:\n node stress_test_mappings.js [options]\n node stress_test_mappings.js --cleanup\n node stress_test_mappings.js --delete-by-count <number>\n\n Description:\n This script stress-tests an Elasticsearch instance by creating a large number\n of indices with many fields. It can also clean up the indices it creates.\n\n Creation Options:\n --host <url> Elasticsearch host URL (default: http://localhost:9200)\n --user <username> Username for basic auth (default: elastic)\n --pass <password> Password for basic auth (default: changeme)\n --apiKey <key> API key for authentication (overrides user/pass)\n --indices <number> Number of indices to create (default: 5000)\n --mappings <number> Number of mappings per index (default: 5000)\n --maxFields <number> The max number of fields per index (default: same as --mappings)\n --shards <number> Number of primary shards per index (default: 1)\n --replicas <number> Number of replicas per index (default: 0)\n\n Cleanup & Recovery Options:\n --cleanup Delete all indices created by this script.\n --delete-by-count <N> Delete the <N> newest stress-test indices.\n --yes Bypass confirmation prompt during cleanup.\n\n Other Options:\n -h, --help Show this help message\n```\n\n\nAnd some test executions are as follows. First CD into the assistant\nworking directory:\n\n```\ncd x-pack/solutions/security/plugins/elastic_assistant/\n```\n\n##### Populate your local ES -- defaults to 5000 indices and 5000\nmappings _per_ index. This _will cause_ a default local ES to crash, so\nstop early (~569), or change configuration :)\n``` bash\nyarn stress-test-mappings \n```\n\n##### If your ES is at its limits, you can slowly dial back the index\ncount with the following:\n``` bash\nyarn stress-test-mappings --delete-by-count 50 --yes\n```\n\n##### Or cleanup all the indices you created entirely with:\n``` bash\nyarn stress-test-mappings --cleanup --yes\n```\n\n##### And for a cloud install, create an API key and populate with the\nfollowing:\n``` bash\nyarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE\n```\n\n> [!IMPORTANT]\n> This is a quick utility script and may be buggy! Continue to vibe code\nit as you see fit, but it worked for my needs here for testing and\nvalidating this issue and fix 🙂\n\n\n\n\n### Checklist\n\n- [X] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n\n---------\n\nCo-authored-by: kibanamachine <[email protected]>","sha":"39a6983ded36b572879346bbcfada819156f3e11"}},{"branch":"9.1","label":"v9.1.3","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Garrett Spong <[email protected]>
… Base Index Entries in deployments with a large number of indices/mappings (elastic#231376) ## Summary This PR fixes an issue with the Security Assistant KB Index Entries interface introduced in `9.1` where a large number of indices/mappings could result in Kibana crashing and preventing the creation of the Index Entry. This is technically a 'fix-hancement' as we are bypassing the underlying issue altogether by switching to use common core API's for index/field suggestions (just as is done in the Discover 'Create a data view' interface), and in turn supporting all fields of type `text`, not just `semantic_text` (elastic#230863). ### Original Issue The underlying issue here was introduced by a change to the `field_caps` API in `9.1` (elastic/elasticsearch#127664) that resulted in the `/internal/elastic_assistant/knowledge_base/_indices` route not finding any indices with a `semantic_text` field, and thus inadvertently falling back to doing a full scan of all mappings ([source](https://github.com/elastic/kibana/blob/b128cee4ee7ccc367e8acf159dbf58a75f081867/x-pack/solutions/security/plugins/elastic_assistant/server/routes/knowledge_base/get_knowledge_base_indices.ts#L69)). A fix to this API was initially investigated, but there was no reasonable API available for fetching all occurrences of `semantic_text` fields, so with `match` queries adding support for `semantic_text` in `8.18`, it was decided to go ahead and enable support for all `text` fields. ### Fix Details The `Index` input field now uses the `dataViews.getIndices()` API for suggestions (instead of the `useKnowledgeBaseIndices` hook/route), which is backed by the [resolve indices ES API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-resolve-index). System indices are filtered out with the `*,-.*` filter. The initial call will return all indices just as the Discover 'Create a data view' interface, which is further filtered upon as the user continues to type. Note: Discover makes subsequent calls upon further user input, though I'm not entirely sure this is necessary here as all indices are initially returned and available for client-side filtering within the input. I will perform further stress testing with many indices/mappings to confirm. The `Field` input now uses the fields already queried for the `Output fields` input suggestions (via `dataViews.getFieldsForWildcard()`), just filtered to those whose `field.esTypes?.includes('text')`. This is potentially still a hot path for client-side code with many mappings, so I will also confirm this with further stress testing. ### Docs @elastic/security-docs / @benironside, we will need to update the Security Assistant KB docs [here](https://www.elastic.co/docs/solutions/security/ai/ai-assistant-knowledge-base#knowledge-base-add-knowledge-index) to indicate that any `text` field is now supported for retrieval. ### Testing #### Functional Testing To confirm proper retrieval of both `text` and `semantic_text` fields we'll need to create an index, add a document, then create the KB Index Entry. Then update the KB Index Entry to reference the other field type, and test again. <details><summary>Create Index</summary> <p> ``` JSON PUT project-details { "mappings": { "properties": { "project_issue": { "type": "text" }, "project_name": { "type": "text" }, "summary": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch", "model_settings": { "service": "elasticsearch", "task_type": "sparse_embedding" } } } } } ``` </p> </details> <details><summary>Create Sample Doc</summary> <p> ``` JSON PUT project-details/_doc/doc1 { "project_issue": "The main issue at hand is the breaking of the space plane", "project_name": "Issue 5", "summary": "This is a summary that contains the word yellow" } ``` </p> </details> <details><summary>Create KB Index Entry (Text)</summary> <p> ``` JSON POST kbn:api/security_ai_assistant/knowledge_base/entries { "type": "index", "name": "Project Details Tool", "index": "project-details", "field": "project_issue", "outputFields": [], "description": "Use this index to answer questions about any project details.", "queryDescription": "Key terms to search for from the user's prompt." } ``` </p> </details> Now you can open the Assistant and perform a query like: ``` Do I have any project details about the issue at hand? ``` Which should result in a [trace like this](https://smith.langchain.com/public/208338a6-42f3-4a9d-bc4c-0ff38d06d34c/r) which calls the generated tool that will then perform a _lexical search_ against the configured index. Ensure citations work as expected for the returned document. Now open the Index Entry in the KB Settings UI and change the `field` to from `project_issue` to `summary` for testing `semantic_text`. Open the Assistant and perform a query like: ``` Do I have any project details for Project Yellow? ``` Which should result in a [trace like this](https://smith.langchain.com/public/3041a48b-5dfd-4c89-9f51-0bcdffd38f63/r) which calls the generated tool that will then perform a _semantic search_ against the configured index. Ensure citations work as expected for the returned document. #### Performance Testing⚠️ In progress -- I will include a script for generating many indices/mappings for testing, and also prepare the `ci-cloud-deploy` instance with the same setup for confirmation. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [X] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials * Will coordinate with @elastic/security-docs on the docs update here. - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
…elastic#231725) ## Summary This is a follow-up to elastic#231376 where I added new `docLinks` for the Security Solution AI Assistant. This PR removes the `TODO` added in that PR and creates a new `securitySolution.aiAssistant` grouping so we can more easily add docLinks as needed. Reviewers Note: Sorry for the extra noise here -- I thought there were more references so decided to do in another PR. Turns out there were not, so this is a quick one!
…tions (elastic#231904) ## Summary Small follow-up improvement to elastic#231376 which added support for `text` fields to Index Entries. This PR adds the field type as a badge in the suggestions so users will know if a semantic or lexical search will be performed (so they can adapt the query instructions accordingly). Note: Needed to update the field API request from `dataViews.getFieldsForWildcard` (which called `/internal/data_views/_fields_for_wildcard`) to use `/api/index_management/mapping/[indexName]` as the former did not have the option to include field type. I confirmed no new privileges were necessary for this API, and the user just needs the same index privileges as before. cc @jamesspi Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b" /> </p> Output Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54" /> </p> --- As part of this PR I've also included the helper script from elastic#231376 for testing these large index/mapping scenarios. This script was almost entirely written in a collab session with `gemini-cli`, and is located in: > x-pack/solutions/security/plugins/elastic_assistant/scripts Options include: ``` bash Elasticsearch Index/Mapping Populator and Cleanup Script Usage: node stress_test_mappings.js [options] node stress_test_mappings.js --cleanup node stress_test_mappings.js --delete-by-count <number> Description: This script stress-tests an Elasticsearch instance by creating a large number of indices with many fields. It can also clean up the indices it creates. Creation Options: --host <url> Elasticsearch host URL (default: http://localhost:9200) --user <username> Username for basic auth (default: elastic) --pass <password> Password for basic auth (default: changeme) --apiKey <key> API key for authentication (overrides user/pass) --indices <number> Number of indices to create (default: 5000) --mappings <number> Number of mappings per index (default: 5000) --maxFields <number> The max number of fields per index (default: same as --mappings) --shards <number> Number of primary shards per index (default: 1) --replicas <number> Number of replicas per index (default: 0) Cleanup & Recovery Options: --cleanup Delete all indices created by this script. --delete-by-count <N> Delete the <N> newest stress-test indices. --yes Bypass confirmation prompt during cleanup. Other Options: -h, --help Show this help message ``` And some test executions are as follows. First CD into the assistant working directory: ``` cd x-pack/solutions/security/plugins/elastic_assistant/ ``` ##### Populate your local ES -- defaults to 5000 indices and 5000 mappings _per_ index. This _will cause_ a default local ES to crash, so stop early (~569), or change configuration :) ``` bash yarn stress-test-mappings ``` ##### If your ES is at its limits, you can slowly dial back the index count with the following: ``` bash yarn stress-test-mappings --delete-by-count 50 --yes ``` ##### Or cleanup all the indices you created entirely with: ``` bash yarn stress-test-mappings --cleanup --yes ``` ##### And for a cloud install, create an API key and populate with the following: ``` bash yarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE ``` > [!IMPORTANT] > This is a quick utility script and may be buggy! Continue to vibe code it as you see fit, but it worked for my needs here for testing and validating this issue and fix 🙂 ### Checklist - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
…tions (elastic#231904) ## Summary Small follow-up improvement to elastic#231376 which added support for `text` fields to Index Entries. This PR adds the field type as a badge in the suggestions so users will know if a semantic or lexical search will be performed (so they can adapt the query instructions accordingly). Note: Needed to update the field API request from `dataViews.getFieldsForWildcard` (which called `/internal/data_views/_fields_for_wildcard`) to use `/api/index_management/mapping/[indexName]` as the former did not have the option to include field type. I confirmed no new privileges were necessary for this API, and the user just needs the same index privileges as before. cc @jamesspi Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/f138c7f0-1d89-4946-8d27-fa6c9c49c60b" /> </p> Output Field Options: <p align="center"> <img width="500" src="https://github.com/user-attachments/assets/2b0395e5-d71d-43af-8a23-9bacc4b02b54" /> </p> --- As part of this PR I've also included the helper script from elastic#231376 for testing these large index/mapping scenarios. This script was almost entirely written in a collab session with `gemini-cli`, and is located in: > x-pack/solutions/security/plugins/elastic_assistant/scripts Options include: ``` bash Elasticsearch Index/Mapping Populator and Cleanup Script Usage: node stress_test_mappings.js [options] node stress_test_mappings.js --cleanup node stress_test_mappings.js --delete-by-count <number> Description: This script stress-tests an Elasticsearch instance by creating a large number of indices with many fields. It can also clean up the indices it creates. Creation Options: --host <url> Elasticsearch host URL (default: http://localhost:9200) --user <username> Username for basic auth (default: elastic) --pass <password> Password for basic auth (default: changeme) --apiKey <key> API key for authentication (overrides user/pass) --indices <number> Number of indices to create (default: 5000) --mappings <number> Number of mappings per index (default: 5000) --maxFields <number> The max number of fields per index (default: same as --mappings) --shards <number> Number of primary shards per index (default: 1) --replicas <number> Number of replicas per index (default: 0) Cleanup & Recovery Options: --cleanup Delete all indices created by this script. --delete-by-count <N> Delete the <N> newest stress-test indices. --yes Bypass confirmation prompt during cleanup. Other Options: -h, --help Show this help message ``` And some test executions are as follows. First CD into the assistant working directory: ``` cd x-pack/solutions/security/plugins/elastic_assistant/ ``` ##### Populate your local ES -- defaults to 5000 indices and 5000 mappings _per_ index. This _will cause_ a default local ES to crash, so stop early (~569), or change configuration :) ``` bash yarn stress-test-mappings ``` ##### If your ES is at its limits, you can slowly dial back the index count with the following: ``` bash yarn stress-test-mappings --delete-by-count 50 --yes ``` ##### Or cleanup all the indices you created entirely with: ``` bash yarn stress-test-mappings --cleanup --yes ``` ##### And for a cloud install, create an API key and populate with the following: ``` bash yarn stress-test-mappings -host https://stress-test.es.us-west2.gcp.elastic-cloud.com --apiKey APK_KEY_HERE ``` > [!IMPORTANT] > This is a quick utility script and may be buggy! Continue to vibe code it as you see fit, but it worked for my needs here for testing and validating this issue and fix 🙂 ### Checklist - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>





Summary
This PR fixes an issue with the Security Assistant KB Index Entries interface introduced in
9.1where a large number of indices/mappings could result in Kibana crashing and preventing the creation of the Index Entry.This is technically a 'fix-hancement' as we are bypassing the underlying issue altogether by switching to use common core API's for index/field suggestions (just as is done in the Discover 'Create a data view' interface), and in turn supporting all fields of type
text, not justsemantic_text(#230863).Original Issue
The underlying issue here was introduced by a change to the
field_capsAPI in9.1(elastic/elasticsearch#127664) that resulted in the/internal/elastic_assistant/knowledge_base/_indicesroute not finding any indices with asemantic_textfield, and thus inadvertently falling back to doing a full scan of all mappings (source). A fix to this API was initially investigated, but there was no reasonable API available for fetching all occurrences ofsemantic_textfields, so withmatchqueries adding support forsemantic_textin8.18, it was decided to go ahead and enable support for alltextfields.Fix Details
The
Indexinput field now uses thedataViews.getIndices()API for suggestions (instead of theuseKnowledgeBaseIndiceshook/route), which is backed by the resolve indices ES API. System indices are filtered out with the*,-.*filter. The initial call will return all indices just as the Discover 'Create a data view' interface, which is further filtered upon as the user continues to type. Note: Discover makes subsequent calls upon further user input, though I'm not entirely sure this is necessary here as all indices are initially returned and available for client-side filtering within the input. I will perform further stress testing with many indices/mappings to confirm.The
Fieldinput now uses the fields already queried for theOutput fieldsinput suggestions (viadataViews.getFieldsForWildcard()), just filtered to those whosefield.esTypes?.includes('text'). This is potentially still a hot path for client-side code with many mappings, so I will also confirm this with further stress testing.Docs
@elastic/security-docs / @benironside, we will need to update the Security Assistant KB docs here to indicate that any
textfield is now supported for retrieval. Docs issue here: elastic/docs-content#2628Testing
Functional Testing
To confirm proper retrieval of both
textandsemantic_textfields we'll need to create an index, add a document, then create the KB Index Entry. Then update the KB Index Entry to reference the other field type, and test again.Create Index
Create Sample Doc
Create KB Index Entry (Text)
Now you can open the Assistant and perform a query like:
Which should result in a trace like this which calls the generated tool that will then perform a lexical search against the configured index. Ensure citations work as expected for the returned document.
Now open the Index Entry in the KB Settings UI and change the
fieldto fromproject_issuetosummaryfor testingsemantic_text.Open the Assistant and perform a query like:
Which should result in a trace like this which calls the generated tool that will then perform a semantic search against the configured index. Ensure citations work as expected for the returned document.
Performance Testing
ci-cloud-deployinstance with the same setup for confirmation.Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.