[Observability Agent] Added markdown links to to select entities#247536
[Observability Agent] Added markdown links to to select entities#247536yuliia-fryshko wants to merge 75 commits intoelastic:mainfrom
Conversation
| **Examples:** | ||
| - "The [billing-service](/app/apm/services/billing-service) is experiencing high latency." | ||
| - "See trace [8a3c42](/app/apm/link-to/trace/8a3c42) for the full request flow." | ||
| - "Error [abcde](/app/apm/services/frontend/errors/abcde) in [frontend](/app/apm/services/frontend)." |
There was a problem hiding this comment.
This is a good start! Can you add support for more entities?
- Infrastructure metrics like
host.name/container.id. - APM entities like transactions, dependencies and service map
- Platform objects like alerts, ML jobs,
There was a problem hiding this comment.
Thank you, @sorenlouv !
I added some additional links, so now we will markdown :
- APM: Services, Traces, Errors (individual and service-level), Transactions, Dependencies, Service Map
- Logs: Service-specific logs and general logs explorer
- Infrastructure: Hosts
- Platform: Alerts, ML Jobs
| | Service | \`[service-name](/app/apm/services/service-name)\` | | ||
| | Trace | \`[trace-id](/app/apm/link-to/trace/trace-id)\` | | ||
| | Error | \`[error-key](/app/apm/services/service-name/errors/error-key)\` | | ||
| | Service Errors | \`[Errors](/app/apm/services/service-name/errors)\` | |
There was a problem hiding this comment.
I think you need to play around with different formats to ensure you find the one that works the best with the frontier LLMs.
Have you tried running 100 iterations of different examples to see how often the LLM gets it right?
Something that stands out:
No clear place holders:
| | Service Errors | \`[Errors](/app/apm/services/service-name/errors)\` | | |
| | Service Errors | \`[Errors](/app/apm/services/<serviceName>/errors)\` | |
Also, there is no clear connection between the format her, and the examples below. For example, I looked for any example called "Service Errors" below but didn't find it. I imagine the LLM might have a similar difficulty mapping the two. Perhaps the two formats should be co-located:
| Service Errors | \`[Errors](/app/apm/services/service-name/errors)\` |
- "View all [errors](/app/apm/services/frontend/errors) for the [frontend](/app/apm/services/frontend) service."
Or you need to have a common key mapping them together. Either way, make sure you verify that this works reliably.
| ` + ENTITY_LINKING_PROMPT | ||
| ), |
There was a problem hiding this comment.
Can you make this consistent with the other sections?
| ` + ENTITY_LINKING_PROMPT | |
| ), | |
| ${getEntityLinkInstructions()} | |
| `), |
| import { OBSERVABILITY_AGENT_TOOL_IDS } from '../tools/register_tools'; | ||
| import { OBSERVABILITY_GET_INDEX_INFO_TOOL_ID } from '../tools'; | ||
| import { getAgentBuilderResourceAvailability } from '../utils/get_agent_builder_resource_availability'; | ||
| import { ENTITY_LINKING_PROMPT } from '../utils/entity_linking_prompt'; |
There was a problem hiding this comment.
This is not a util. I'd like to see all instructions co-located in a file or folder eg/agent/instructions.ts and exported so they can be re-used in agents, attachment etc.
There was a problem hiding this comment.
... actually, you will only have to add it to the Obs Agent when this is merged: #249776
So I'd suggest keeping it inline in this file, similar to the other instructions
…ct for > 4000 users limitation (elastic#249775) ### Summary This PR fixes the eggbox on privileged monitoring not showing > 4000 user count accurately due to [limitation with ESQL's count distinct](https://www.elastic.co/docs/reference/query-languages/esql/functions-operators/aggregation-functions#esql-agg-count-distinct-approximate) The main change here is ~~using lensAttributes instead to use a DSL query instead~~. [**EDIT**] Changing the ESQL query to use double STATS. ``` FROM ${getPrivilegedMonitorUsersIndex(namespace)} | WHERE user.is_privileged == true | STATS BY user.name | STATS count = COUNT(*) ``` Works well for 1k, 4k, and 10k user counts. <img width="1655" height="464" alt="Screenshot 2026-01-21 at 3 26 37 PM" src="https://github.com/user-attachments/assets/4dbae045-e1fd-40d0-aca2-8918181bed43" /> <img width="819" height="339" alt="Screenshot 2026-01-21 at 3 14 52 PM" src="https://github.com/user-attachments/assets/28bc8ace-897c-4712-aed1-28a03c0243be" /> <img width="1659" height="353" alt="Screenshot 2026-01-21 at 2 56 33 PM" src="https://github.com/user-attachments/assets/c1435265-e2d7-4dda-bbaf-9c91b12b49b8" /> **To test:** 1. Navigate to kibana (loaded up etc) 2. Upload > 1000 users, previously tested with 1008 users. 3. Should see on eggbox, privileged user count is accurate and matching with dev tools result below: 4. dev tools command: ``` GET .entity_analytics.monitoring.users-*/_search { "size": 0, "aggs": { "by_priv": { "terms": { "field": "user.is_privileged" } } } } ``` 5. Edit your csv, remove some of these users and re-upload. 6. Ensure the count is accurate to the new number of csv uploaded users and the omitted users show privileged false in dev tools. If someone wants to use a scripted way to generate csv users, may use the below cli command ``` for i in {1..10000}; do echo "bulk_user_$i" >> privileged_users.csv; done && wc -l privileged_users.csv ``` --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: abhishekbhatia1710 <abhishek.bhatia@elastic.co> Co-authored-by: Abhishek Bhatia <117628830+abhishekbhatia1710@users.noreply.github.com>
…tching entities (elastic#247815) ## Summary This PR Introduces LOOKUP JOIN as the primary entity enrichment mechanism while maintaining backward compatibility with the deprecated ENRICH policy during the transition period. Closes [issue](elastic#232226) and multiple flaky tests due to entity store infra initialization instability. **Server-side changes (fetch_graph.ts)** - Implement LOOKUP JOIN query generation for entity enrichment - Add fallback logic: LOOKUP JOIN → ENRICH policy → no enrichment - Add `getEntitiesLatestIndexName` helper for v2 index names **Test infrastructure** - Add `executeEnrichPolicy` helper to entity_store.ts utils - Create entity_store_v2 test archives with lookup mode mappings - Create entity_store_v2_standard_mode for fallback scenario testing **API integration tests (graph.ts)** - Refactor 'Enrich graph with entity metadata' to test both flows - Add enrichmentConfigs array for ENRICH (v1) and LOOKUP JOIN (v2) - Add fallback test: v2 index exists but not in lookup mode **FTR functional tests** - Update alerts_flyout.ts with dual enrichment config support - Update events_flyout.ts with dual enrichment config support - Reuse entity_store_v2 archives across functional tests **Api/FTR tests coverage** Scenario | v2 Lookup Index | ENRICH Policy | Expected Path | Currently Tested? -- | -- | -- | -- | -- 1 | ✅ Exists in lookup mode | N/A | LOOKUP JOIN | ✅ v2 tests 2 | ❌ Doesn't exist | ✅ Exists | ENRICH | ✅ v1 tests 3 | ❌ Doesn't exist | ❌ Doesn't exist | No enrichment | ✅ All other tests (Happy flows, Validation, etc.) v2 - refers to the new mappings and data mocks we load to test the LOOKUP JOIN functionality - each test could be added just once and it will be tested in both scenarios - using ENRICH and LOOKUP JOIN until we stop supporting querying enrich policies. ## How to test 1. Deploy a local env using the following command: `node scripts/es snapshot --license trial -E path.data=../default -E reindex.remote.whitelist=kfir-graph-viz-wip-ba715e.es.eu-west-1.aws.qa.elastic.cloud:443 -E xpack.security.authc.api_key.enabled=true` 2. run kibana using `yarn start` 3. Go to `Advanced settings` and make sure`securitySolution:enableGraphVisualization` and `securitySolution:enableAssetInventory` features are toggled on. 4. Got to Security -> inventory -> click on 'Enable Asset Inventory'. 5. Install latest gcp-auditlogs integration (skip agent installation) v2.46.0 and above. 6. Install aws-cloudtrail integration (skip agent installation) v4.7.0 and above. 7. Install cloud asset discovery integration (skip agent installation). 8. reindex gcp-auditlogs data from long-live env: ``` POST _reindex { "conflicts": "proceed", "source": { "remote": { "host": "https://kfir-graph-viz-wip-ba715e.es.eu-west-1.aws.qa.elastic.cloud:443", "socket_timeout": "30s", "connect_timeout": "30s", "headers": { "Authorization": "<api key>" } }, "index": "logs-*", "query": { "bool": { "must": [ { "term": { "data_stream.dataset": "gcp.audit" } }, { "bool": { "should": [ { "exists": { "field": "user.entity.id" } }, { "exists": { "field": "host.entity.id" } }, { "exists": { "field": "service.entity.id" } }, { "exists": { "field": "entity.id" } } ], "minimum_should_match": 1 } }, { "bool": { "should": [ { "exists": { "field": "user.target.entity.id" } }, { "exists": { "field": "host.target.entity.id" } }, { "exists": { "field": "service.target.entity.id" } }, { "exists": { "field": "entity.target.id" } } ], "minimum_should_match": 1 } } ] } } }, "dest": { "op_type": "create", "index": "logs-gcp.audit-default" } } ``` 9. reindex aws-cloudtrail data from long-live env: ``` POST _reindex { "conflicts": "proceed", "source": { "remote": { "host": "https://kfir-graph-viz-wip-ba715e.es.eu-west-1.aws.qa.elastic.cloud:443", "socket_timeout": "30s", "connect_timeout": "30s", "headers": { "Authorization": "ApiKey YmNXcUNaZ0JYd1lMQmZkOEZ1bFc6TDZ3RFNVOXh2R2NEWV9Nb2YyTWxtQQ==" } }, "index": "logs-aws.cloudtrail-default", "query": { "bool": { "must": [ { "bool": { "should": [ { "exists": { "field": "user.entity.id" } }, { "exists": { "field": "host.entity.id" } }, { "exists": { "field": "service.entity.id" } }, { "exists": { "field": "entity.id" } } ], "minimum_should_match": 1 } }, { "bool": { "should": [ { "exists": { "field": "user.target.entity.id" } }, { "exists": { "field": "host.target.entity.id" } }, { "exists": { "field": "service.target.entity.id" } }, { "exists": { "field": "entity.target.id" } } ], "minimum_should_match": 1 } } ] } } }, "dest": { "op_type": "create", "index": "logs-aws.cloudtrail-default" } } ``` 10. reindex entities data from long-live env: ``` POST _reindex?wait_for_completion=true { "conflicts": "proceed", "source": { "remote": { "host": "https://kfir-graph-viz-wip-ba715e.es.eu-west-1.aws.qa.elastic.cloud:443", "socket_timeout": "30s", "connect_timeout": "30s", "headers": { "Authorization": "message for api key" } }, "index": ".entities.v1.latest.security_generic_default", "query": { "bool": { "must": [], "filter": [ { "range": { "@timestamp": { "gte": "now-2y", "lte": "now" } } } ] } } }, "dest": { "op_type": "create", "index": ".entities.v1.latest.security_generic_default" }, "script": { "source": """ ctx._source.doc_id = ctx._id; ctx._source.doc_index = ctx._index; if (ctx._source.asset != null) { if (ctx._source.asset.containsKey('category')) { ctx._source['entity.category'] = ctx._source.asset.category; } if (ctx._source.asset.containsKey('name')) { ctx._source['entity.name'] = ctx._source.asset.name; } if (ctx._source.asset.containsKey('type')) { ctx._source['entity.type'] = ctx._source.asset.type; } if (ctx._source.asset.containsKey('sub_type')) { ctx._source['entity.sub_type'] = ctx._source.asset.sub_type; } if (ctx._source.asset.containsKey('sub_category')) { ctx._source['entity.sub_category'] = ctx._source.asset.sub_category; } } """ } } ``` 11. Create an entities v2 index with lookup mode: ``` PUT .entities.v2.latest.security_generic_default { "settings": { "index": { "mode": "lookup", "number_of_shards": 1, "number_of_replicas": 1 } }, "mappings": { "_meta": { "version": "1.6.0" }, "dynamic_templates": [ { "ecs_timestamp": { "match": "@timestamp", "mapping": { "ignore_malformed": false, "type": "date" } } }, { "ecs_message_match_only_text": { "path_match": [ "message", "*.message" ], "unmatch_mapping_type": "object", "mapping": { "type": "match_only_text" } } }, { "ecs_non_indexed_keyword": { "path_match": [ "*event.original", "*gen_ai.agent.description" ], "mapping": { "doc_values": false, "index": false, "type": "keyword" } } }, { "ecs_non_indexed_long": { "path_match": "*.x509.public_key_exponent", "mapping": { "doc_values": false, "index": false, "type": "long" } } }, { "ecs_ip": { "path_match": [ "ip", "*.ip", "*_ip" ], "match_mapping_type": "string", "mapping": { "type": "ip" } } }, { "ecs_wildcard": { "path_match": [ "*.io.text", "*.message_id", "*registry.data.strings", "*url.path" ], "unmatch_mapping_type": "object", "mapping": { "type": "wildcard" } } }, { "ecs_path_match_wildcard_and_match_only_text": { "path_match": [ "*.body.content", "*url.full", "*url.original" ], "unmatch_mapping_type": "object", "mapping": { "fields": { "text": { "type": "match_only_text" } }, "type": "wildcard" } } }, { "ecs_match_wildcard_and_match_only_text": { "match": [ "*command_line", "*stack_trace" ], "unmatch_mapping_type": "object", "mapping": { "fields": { "text": { "type": "match_only_text" } }, "type": "wildcard" } } }, { "ecs_path_match_keyword_and_match_only_text": { "path_match": [ "*.title", "*.executable", "*.name", "*.working_directory", "*.full_name", "*.display_name", "*file.path", "*file.target_path", "*os.full", "*email.subject", "*vulnerability.description", "*user_agent.original" ], "unmatch_mapping_type": "object", "mapping": { "fields": { "text": { "type": "match_only_text" } }, "type": "keyword" } } }, { "ecs_date": { "path_match": [ "*.timestamp", "*_timestamp", "*.not_after", "*.not_before", "*.accessed", "created", "*.created", "*.installed", "*.creation_date", "*.ctime", "*.mtime", "ingested", "*.ingested", "*.start", "*.end", "*.indicator.first_seen", "*.indicator.last_seen", "*.indicator.modified_at", "*threat.enrichments.matched.occurred" ], "unmatch_mapping_type": "object", "mapping": { "type": "date" } } }, { "ecs_path_match_float": { "path_match": [ "*.score.*", "*_score*" ], "path_unmatch": "*.version", "unmatch_mapping_type": "object", "mapping": { "type": "float" } } }, { "ecs_usage_double_scaled_float": { "path_match": "*.usage", "match_mapping_type": [ "double", "long", "string" ], "mapping": { "scaling_factor": 1000, "type": "scaled_float" } } }, { "ecs_geo_point": { "path_match": "*.geo.location", "mapping": { "type": "geo_point" } } }, { "ecs_flattened": { "path_match": [ "*structured_data", "*exports", "*imports" ], "match_mapping_type": "object", "mapping": { "type": "flattened" } } }, { "ecs_gen_ai_integers": { "path_match": [ "*gen_ai.request.max_tokens", "*gen_ai.usage.input_tokens", "*gen_ai.usage.output_tokens", "*gen_ai.request.choice.count", "*gen_ai.request.seed" ], "mapping": { "type": "integer" } } }, { "ecs_gen_ai_doubles": { "path_match": [ "*gen_ai.request.temperature", "*gen_ai.request.top_k", "*gen_ai.request.frequency_penalty", "*gen_ai.request.presence_penalty", "*gen_ai.request.top_p" ], "mapping": { "type": "double" } } }, { "all_strings_to_keywords": { "match_mapping_type": "string", "mapping": { "ignore_above": 1024, "type": "keyword" } } }, { "strings_as_keyword": { "match_mapping_type": "string", "mapping": { "fields": { "text": { "type": "text" } }, "ignore_above": 1024, "type": "keyword" } } }, { "entity_metrics": { "path_match": "entity.metrics.*", "match_mapping_type": [ "long", "double" ], "mapping": { "type": "{dynamic_type}" } } } ], "date_detection": false, "properties": { "@timestamp": { "type": "date" }, "asset": { "properties": { "business_unit": { "type": "keyword" }, "criticality": { "type": "keyword" }, "environment": { "type": "keyword" }, "id": { "type": "keyword" }, "model": { "type": "keyword" }, "name": { "type": "keyword" }, "owner": { "type": "keyword" }, "serial_number": { "type": "keyword" }, "vendor": { "type": "keyword" } } }, "cloud": { "properties": { "account": { "properties": { "id": { "type": "keyword" }, "name": { "type": "keyword" } } }, "availability_zone": { "type": "keyword" }, "instance": { "properties": { "id": { "type": "keyword" }, "name": { "type": "keyword" } } }, "machine": { "properties": { "type": { "type": "keyword" } } }, "project": { "properties": { "id": { "type": "keyword" }, "name": { "type": "keyword" } } }, "provider": { "type": "keyword" }, "region": { "type": "keyword" }, "service": { "properties": { "name": { "type": "keyword" } } } } }, "doc_id": { "type": "keyword", "ignore_above": 1024 }, "doc_index": { "type": "keyword", "ignore_above": 1024 }, "entity": { "properties": { "EngineMetadata": { "properties": { "Type": { "type": "keyword", "ignore_above": 1024 } } }, "attributes": { "properties": { "Asset": { "type": "boolean" }, "Managed": { "type": "boolean" }, "Mfa_enabled": { "type": "boolean" }, "Privileged": { "type": "boolean" } } }, "behaviors": { "properties": { "Brute_force_victim": { "type": "boolean" }, "New_country_login": { "type": "boolean" }, "Used_usb_device": { "type": "boolean" } } }, "definition_id": { "type": "keyword", "ignore_above": 1024 }, "definition_version": { "type": "keyword", "ignore_above": 1024 }, "display_name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 1024 } } }, "id": { "type": "keyword" }, "identity_fields": { "type": "keyword" }, "last_seen_timestamp": { "type": "date" }, "lifecycle": { "properties": { "First_seen": { "type": "date" }, "Last_activity": { "type": "date" } } }, "name": { "type": "keyword" }, "risk": { "properties": { "calculated_level": { "type": "keyword" }, "calculated_score": { "type": "float" }, "calculated_score_norm": { "type": "float" } } }, "schema_version": { "type": "keyword", "ignore_above": 1024 }, "source": { "type": "keyword" }, "sub_type": { "type": "keyword" }, "type": { "type": "keyword" }, "url": { "type": "keyword" } } }, "event": { "properties": { "ingested": { "type": "date" } } }, "host": { "properties": { "architecture": { "type": "keyword" }, "boot": { "properties": { "id": { "type": "keyword" } } }, "cpu": { "properties": { "usage": { "type": "keyword" } } }, "disk": { "properties": { "read": { "properties": { "bytes": { "type": "keyword" } } }, "write": { "properties": { "bytes": { "type": "keyword" } } } } }, "domain": { "type": "keyword" }, "hostname": { "type": "keyword" }, "id": { "type": "keyword" }, "ip": { "type": "ip" }, "mac": { "type": "keyword" }, "name": { "type": "keyword" }, "network": { "properties": { "egress": { "properties": { "bytes": { "type": "keyword" }, "packets": { "type": "keyword" } } }, "ingress": { "properties": { "bytes": { "type": "keyword" }, "packets": { "type": "keyword" } } } } }, "pid_ns_ino": { "type": "keyword" }, "type": { "type": "keyword" }, "uptime": { "type": "keyword" } } }, "labels": { "type": "object" }, "orchestrator": { "properties": { "api_version": { "type": "keyword" }, "cluster": { "properties": { "id": { "type": "keyword" }, "name": { "type": "keyword" }, "url": { "type": "keyword" }, "version": { "type": "keyword" } } }, "namespace": { "type": "keyword" }, "organization": { "type": "keyword" }, "resource": { "properties": { "annotation": { "type": "keyword" }, "id": { "type": "keyword" }, "ip": { "type": "keyword" }, "label": { "type": "keyword" }, "name": { "type": "keyword" }, "parent": { "properties": { "type": { "type": "keyword" } } }, "type": { "type": "keyword" } } }, "type": { "type": "keyword" } } }, "tags": { "type": "keyword", "ignore_above": 1024 }, "user": { "properties": { "domain": { "type": "keyword" }, "email": { "type": "keyword" }, "full_name": { "type": "keyword", "fields": { "text": { "type": "match_only_text" } } }, "hash": { "type": "keyword" }, "id": { "type": "keyword" }, "name": { "type": "keyword", "fields": { "text": { "type": "match_only_text" } } }, "roles": { "type": "keyword" } } } } } } ``` 12. reindex data from v1 to v2 index: ``` POST _reindex { "source": { "index": ".entities.v1.latest.security_generic_default" }, "dest": { "index": ".entities.v2.latest.security_generic_default", "op_type": "create" } } ``` 13. go to security -> explore -> network/users/hosts. 14. apply filters to see only events containing graph representation. <img width="4074" height="818" alt="image" src="https://github.com/user-attachments/assets/46605770-73f3-41af-9241-f3013ccc5038" /> 15. open the graph and play with different filters and combinations to get nodes with entity data. 16. graph should work as expected. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
🔍 Preview links for changed docs |
|
Sorry, I faced some merge conficts issue. It will be more clean to close this PR |
💔 Build Failed
Failed CI StepsHistory
|
The agent now automatically formats known Observability entities as Markdown links in its responses. This enables users to click directly on entity references to navigate to the relevant APM views, improving workflow efficiency.
Changes
Added Entity Linking instructions to the Observability Agent's system prompt and Error and Alerts AI Insights.
Testing with Cursor:
Test Prompt:
test_prompt.md
Results:
hereisresults.md
Test scenario:
Traces for Error AI insight and Alert AI Insight