Skip to content

[Security solution] AI alert triage steps + workflow#253245

Open
KDKHD wants to merge 75 commits intoelastic:mainfrom
KDKHD:feature/alert-false-triage-workflow3
Open

[Security solution] AI alert triage steps + workflow#253245
KDKHD wants to merge 75 commits intoelastic:mainfrom
KDKHD:feature/alert-false-triage-workflow3

Conversation

@KDKHD
Copy link
Copy Markdown
Member

@KDKHD KDKHD commented Feb 16, 2026

Summary

Merge #251291 first ✅

Summarize your PR. If it involves visual changes include a screenshot or gif.

Important

All changes in this PR are gated behind feature flags.

Introduces a Security Solution preinstalled security.alert.validation workflow and supporting Security workflow step definitions used to build enriched alerts context and drive AI-assisted triage.

image

Security Solution workflows

Step definitions

Adds two Security workflow step types (server + public registration):

(expandable ⬇️)

security.renderAlertNarrative — Renders a human-readable narrative string for an alert from its event, process, network, and host fields (Timeline-like plain-English summary for notes and LLM context).

Generates a human-readable story about what happened in the alert, similar to what is shown in the security timeline or the alert reason. In the workflow, this is used to give the LLM additional context about related alerts. Converting to this human-readable format makes the alerts easier for the LLM to understand.

image image

Similar to this:
image

Schema:

              - name: get_related_alert_timeline_string
                type: security.renderAlertNarrative
                with:
                  alertId: "{{ foreach.item.alert_id }}"
                  alertIndex: "{{ foreach.item.alert_index }}"
security.buildAlertEntityGraph — Builds an entity-correlation graph from a seed alert via BFS over shared entities (e.g. host, user, service) with configurable scoring and time-window controls; returns nodes, edges, alerts (sorted by timestamp), and stats.

Workflow step definition allows you to configure which entities should be extracted from alerts and used to find related alerts. This is the flow:

  1. Fetch seed alert
  2. Extract entities defined in entity_fields from the seed alert.
  3. Find other alerts that have the same entity values that were created within seed alert time +- seed_window.
  4. With the newly found alerts, do the same, extract entities, and find new alerts within alert time +- expand_window.
  5. Repeat max_depth times or until max_alerts is reached.

This is used to find alerts related to the seed alert for additional context to the LLM.

Schema:

          # Related alerts: builds a graph of alerts that share entity values
          # (host, user, process, IP, etc.) with the seed alert. Scores edges
          # by field weight to surface the most relevant connections.
          - name: get_related_alerts
            type: security.getRelatedAlerts
            with:
              alertId: "{{foreach.item._id}}" # seed alert id
              alertIndex: "{{foreach.item._index}}" # seed alert index
              include_seed: true # if seed alert should be included in output
              entity_fields: # entities to find matches on
                # --- Strong, stable identifiers (likely same attack) ---
                - field: "process.entity_id" # entity name
                  score: 5 # entity score. A score is calculated for related alerts. Score must be higher than min_entity_score for the match to be made
                - field: "agent.id"
                  score: 4
                - field: "entity.id"
                  score: 4
                - field: "host.id"
                  score: 4

                # --- Identity-level correlation ---
                - field: "user.id"
                  score: 4
                - field: "user.name"
                  score: 2

                # --- Host-level correlation ---
                - field: "host.name"
                  score: 2
                - field: "host.hostname"
                  score: 2

                # --- Service / workload identity ---
                - field: "service.id"
                  score: 2

                # --- Network-based correlation (weaker, noisier) ---
                - field: "source.ip"
                  score: 1
                  aliases: # define other entity names that, if the value is matched on, are also considered related alerts
                    - field: "destination.ip"
                      score: 4 # lateral-movement detection
                - field: "destination.ip"
                  score: 1
                  aliases:
                    - field: "source.ip"
                      score: 4 # lateral-movement detection

                # --- Container / cloud (contextual) ---
                - field: "container.id"
                  score: 2
              min_entity_score: 4 # min score to be considered a related alert. Calculated by summing scored of matching entitiy fields
              # Ignore common service accounts that create false correlation chains
              ignore_entities: # Entity values to ignore to avoid noise
                - field: "user.name"
                  values: ["root", "SYSTEM", "Administrator"]
              seed_window: "1h" # time to expand search around seed alert
              expand_window: "1h" # time to expand search around discovered alerts
              max_depth: 20 # max depth of search
              max_alerts: 20 # max alerts to retrieve 

Preinstalled workflows bootstrap

Adds preinstalled workflows bootstrap that installs/updates registered workflow YAMLs via workflowsManagement using a system request, gated behind a feature flag. Using this until system workflows are supported.

Enabled via featureFlag:

// kibana.dev.yml
feature_flags.overrides.securitySolution.preinstalledWorkflowsEnabled: true

Preinstalled workflow: security.alert.validation

Adds preinstalled workflow security.alert.validation (alert_validation_workflow.yml) that:

  • Loops over triggered alerts.
  • Gathers enrichment context using a mix of generic steps and the Security steps above:
    • Close history (search + data steps).
    • Global prevalence (search + data steps).
    • Related alerts graph via security.buildAlertEntityGraph.
    • Rule metadata (search + data steps).
    • Narrative strings via security.renderAlertNarrative.
  • Calls ai.agent with a structured output schema and a security.alert attachment.
  • Writes verdict notes, applies workflow tags, and conditionally attempts auto-close for false positives.

Important

In the future, once system workflows are fully supported, this bootstrapping will be replaced. For now, this is being used during active development and testing.

How to test:

  1. Enable preinstalled workflow and new steps
// kibana.dev.yml
feature_flags.overrides.securitySolution.preinstalledWorkflowsEnabled: true # new workflow
feature_flags.overrides.securitySolution.registerAlertValidationStepsEnabled: true # new steps
  1. Go to the workflow http://localhost:5601/app/workflows/workflow-3cf6d7f4-864f-4722-834d-ae1743f445ea
  2. Run the workflow by selecting an alert as input. LLM connector id may need to be changed depending on if you have access to EIS chat models.
  3. View the alert notes for the respective alert and see how the LLM's discoveries are added. Also, check in the alert properties that the workflow tags have been added.
image image 5. Re-running the workflow on the same alert will not produce new results unless `override_previous: true` is set, or the tags added to the alert are removed.

Workflow execution
image

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Identify risks

Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 16, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 16, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 16, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 16, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 17, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 17, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 17, 2026

/ci

@KDKHD
Copy link
Copy Markdown
Member Author

KDKHD commented Feb 17, 2026

/ci

entity_fields:
- field: "process.entity_id"
score: 5
- field: "agent.id"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe replace by host.id because some integrations ingest data to Elastic from one agent on behalf of many hosts (kind of proxying logs - 1 agent.id represent multiple host.ids):

Image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example of community reported issue on our OOB rules with similar issue (aggregating by agent.id vs host.id) https://elasticstack.slack.com/archives/C016E72DWDS/p1772328226257339?thread_ts=1772328054.787369&cid=C016E72DWDS

@KDKHD KDKHD mentioned this pull request Mar 2, 2026
10 tasks
Copy link
Copy Markdown
Contributor

@Mikaayenson Mikaayenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing we need to think about is how this will handle Esql.* fields or HOR that may not include some of these fields used in this workflow given today ESQL alerts are some of the hardest to triage today. Maybe add Esql.* to SOURCE_INCLUDES but there may be a bigger implementation issue that needs to be addressed in different places for this.

const SOURCE_INCLUDES = [
'event.category',
'event.action',
'event.outcome',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'event.outcome',
'event.outcome',
'event.provider',

I think this is a great first step for endpoint specific alerts. We should add a todo / reminder to expand this to cover other domains. e.g. cloud, saas, K8s context, etc. is severely missing here.

unique_users: "${{ variables.global_prevalence_unique_users }}"
top_hosts: "${{ variables.global_prevalence_top_hosts }}"
message: >-
{% if variables.global_prevalence_unique_hosts >= 50 -%}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types of hardcoded values need to be thoroughly tested. Do we have any test results to back them?

- name: set_related_alert_line
type: data.set
with:
related_alert_line: "{{foreach.item.alert_index}}:{{ foreach.item.alert_id }}: {{ steps.get_related_alert_timeline_string.output.timeline_string }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add context to these related alerts as to why they're related to help distinguish between something correlated vs more noise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could include which entity caused the relation?

- name: build_agent_message
type: data.set
with:
message: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should filter based on kibana.alert.rule.type . e.g. Threat indicator match vs eql are not the same.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by filtering? Filtering which alerts go through this flow?

query: |-
FROM .alerts-security.alerts-*
| WHERE `kibana.alert.rule.uuid` == ?1
| STATS total_alerts = COUNT(*), unique_hosts = COUNT_DISTINCT(`host.name`), unique_users = COUNT_DISTINCT(`user.name`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ similar to what I mentioned before, this is heavily endpoint focused. Some cloud alerts may not have the same fields.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. Will try to improve this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have replaced the query with an Elastic Search query that uses a script to calculate effective_source and effective_user.

# specificity so strong identifiers (process.entity_id, user.id)
# rank higher than noisier ones (source.ip, host.name).
- name: get_related_alerts
type: security.buildAlertEntityGraph
Copy link
Copy Markdown
Contributor

@Mikaayenson Mikaayenson Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about. With the new EUID coming from the entity analytics folks, we may be able to use this. Also any reason why we can't use the entity store? .entities-v1-latest.*

We can probably add entity.id and *.entity.id or maybe an enrichment step that uses the entity analytics store. I image this will be more resilient over time.

Co-authored-by: Mika Ayenson, PhD <Mikaayenson@users.noreply.github.com>
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 3, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #75 / Rules Management - Prebuilt Rules (Common tests) @ess @serverless @skipInServerlessMKI Review installation using mocked prebuilt rule assets Pagination returns correct rules for a page specified in the request

Metrics [docs]

‼️ ERROR: no builds found for mergeBase sha [8500782]

History

@KDKHD KDKHD added the ci:cloud-deploy Create or update a Cloud deployment label Mar 4, 2026
@KDKHD KDKHD changed the title [Security solution] AI alert false positive triage steps + workflow [Security solution] AI alert triage steps + workflow Apr 24, 2026
@talboren talboren removed the Team:One Workflow Team label for One Workflow (Workflow automation) label Apr 26, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related_alert_ids isn’t declared in the ai.agent schema block above (lines 753–780), so structured output may never include this field. Please either add it to the schema (with the expected shape, e.g. { alert_id }[]) or drop/guard this “Related Alert IDs” section so the template doesn’t depend on an undefined property.

@botelastic botelastic Bot added the Team:One Workflow Team label for One Workflow (Workflow automation) label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:cloud-deploy Create or update a Cloud deployment release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants