[Security solution] AI alert triage steps + workflow#253245
[Security solution] AI alert triage steps + workflow#253245KDKHD wants to merge 75 commits intoelastic:mainfrom
Conversation
|
/ci |
…HD/kibana into feature/alert-false-triage-workflow3
|
/ci |
|
/ci |
…HD/kibana into feature/alert-false-triage-workflow3
|
/ci |
|
/ci |
|
/ci |
|
/ci |
|
/ci |
…prevent abuse circumvention through user.names in differnt languages
| entity_fields: | ||
| - field: "process.entity_id" | ||
| score: 5 | ||
| - field: "agent.id" |
There was a problem hiding this comment.
an example of community reported issue on our OOB rules with similar issue (aggregating by agent.id vs host.id) https://elasticstack.slack.com/archives/C016E72DWDS/p1772328226257339?thread_ts=1772328054.787369&cid=C016E72DWDS
Mikaayenson
left a comment
There was a problem hiding this comment.
One other thing we need to think about is how this will handle Esql.* fields or HOR that may not include some of these fields used in this workflow given today ESQL alerts are some of the hardest to triage today. Maybe add Esql.* to SOURCE_INCLUDES but there may be a bigger implementation issue that needs to be addressed in different places for this.
| const SOURCE_INCLUDES = [ | ||
| 'event.category', | ||
| 'event.action', | ||
| 'event.outcome', |
There was a problem hiding this comment.
| 'event.outcome', | |
| 'event.outcome', | |
| 'event.provider', |
I think this is a great first step for endpoint specific alerts. We should add a todo / reminder to expand this to cover other domains. e.g. cloud, saas, K8s context, etc. is severely missing here.
| unique_users: "${{ variables.global_prevalence_unique_users }}" | ||
| top_hosts: "${{ variables.global_prevalence_top_hosts }}" | ||
| message: >- | ||
| {% if variables.global_prevalence_unique_hosts >= 50 -%} |
There was a problem hiding this comment.
These types of hardcoded values need to be thoroughly tested. Do we have any test results to back them?
| - name: set_related_alert_line | ||
| type: data.set | ||
| with: | ||
| related_alert_line: "{{foreach.item.alert_index}}:{{ foreach.item.alert_id }}: {{ steps.get_related_alert_timeline_string.output.timeline_string }}" |
There was a problem hiding this comment.
We should probably add context to these related alerts as to why they're related to help distinguish between something correlated vs more noise.
There was a problem hiding this comment.
I could include which entity caused the relation?
| - name: build_agent_message | ||
| type: data.set | ||
| with: | ||
| message: | |
There was a problem hiding this comment.
We should filter based on kibana.alert.rule.type . e.g. Threat indicator match vs eql are not the same.
There was a problem hiding this comment.
Can you clarify what you mean by filtering? Filtering which alerts go through this flow?
| query: |- | ||
| FROM .alerts-security.alerts-* | ||
| | WHERE `kibana.alert.rule.uuid` == ?1 | ||
| | STATS total_alerts = COUNT(*), unique_hosts = COUNT_DISTINCT(`host.name`), unique_users = COUNT_DISTINCT(`user.name`) |
There was a problem hiding this comment.
++ similar to what I mentioned before, this is heavily endpoint focused. Some cloud alerts may not have the same fields.
There was a problem hiding this comment.
Valid concern. Will try to improve this.
There was a problem hiding this comment.
Have replaced the query with an Elastic Search query that uses a script to calculate effective_source and effective_user.
| # specificity so strong identifiers (process.entity_id, user.id) | ||
| # rank higher than noisier ones (source.ip, host.name). | ||
| - name: get_related_alerts | ||
| type: security.buildAlertEntityGraph |
There was a problem hiding this comment.
Something to think about. With the new EUID coming from the entity analytics folks, we may be able to use this. Also any reason why we can't use the entity store? .entities-v1-latest.*
We can probably add entity.id and *.entity.id or maybe an enrichment step that uses the entity analytics store. I image this will be more resilient over time.
Co-authored-by: Mika Ayenson, PhD <Mikaayenson@users.noreply.github.com>
git pbranch 'feature/alert-false-triage-workflow3' of github.com:KDKHD/kibana into feature/alert-false-triage-workflow3
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
|
There was a problem hiding this comment.
related_alert_ids isn’t declared in the ai.agent schema block above (lines 753–780), so structured output may never include this field. Please either add it to the schema (with the expected shape, e.g. { alert_id }[]) or drop/guard this “Related Alert IDs” section so the template doesn’t depend on an undefined property.

Summary
Merge #251291 first ✅
Summarize your PR. If it involves visual changes include a screenshot or gif.
Important
All changes in this PR are gated behind feature flags.
Introduces a Security Solution preinstalled
security.alert.validationworkflow and supporting Security workflow step definitions used to build enriched alerts context and drive AI-assisted triage.Security Solution workflows
Step definitions
Adds two Security workflow step types (server + public registration):
(expandable ⬇️)
security.renderAlertNarrative — Renders a human-readable narrative string for an alert from its event, process, network, and host fields (Timeline-like plain-English summary for notes and LLM context).
Generates a human-readable story about what happened in the alert, similar to what is shown in the security timeline or the alert reason. In the workflow, this is used to give the LLM additional context about related alerts. Converting to this human-readable format makes the alerts easier for the LLM to understand.
Similar to this:

Schema:
security.buildAlertEntityGraph — Builds an entity-correlation graph from a seed alert via BFS over shared entities (e.g. host, user, service) with configurable scoring and time-window controls; returns nodes, edges, alerts (sorted by timestamp), and stats.
Workflow step definition allows you to configure which entities should be extracted from alerts and used to find related alerts. This is the flow:
entity_fieldsfrom the seed alert.seed_window.expand_window.max_depthtimes or untilmax_alertsis reached.This is used to find alerts related to the seed alert for additional context to the LLM.
Schema:
Preinstalled workflows bootstrap
Adds preinstalled workflows bootstrap that installs/updates registered workflow YAMLs via workflowsManagement using a system request, gated behind a feature flag. Using this until system workflows are supported.
Enabled via featureFlag:
Preinstalled workflow: security.alert.validation
Adds preinstalled workflow security.alert.validation (alert_validation_workflow.yml) that:
Important
In the future, once system workflows are fully supported, this bootstrapping will be replaced. For now, this is being used during active development and testing.
How to test:
Workflow execution

Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
release_note:breakinglabel should be applied in these situations.release_note:*label is applied per the guidelinesbackport:*labels.Identify risks
Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.
Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.