[ResponseOps][Alerting] Alerting v2: Director by cnasikas · Pull Request #247673 · elastic/kibana

cnasikas · 2025-12-31T11:53:16Z

Summary

Note

Dear reviewers. This PR is getting merged into a feature branch. Only the ResponseOps review is needed it at the moment. We will request for your review when we open the feature branch PR to be merged on main.

This PR implements the Director component of the alerting v2 core engine. The Director is an asynchronous state engine responsible for deriving alert state transitions (e.g., Pending → Active) from the immutable stream of raw alert events.

Architecture

Strategy Pattern

To ensure the Director remains agnostic of specific business logic, we implemented the Strategy Pattern. The Director facilitates the data flow, while an ITransitionStrategy defines the actual state machine logic. This allows us to support different transition behaviors, based on rule configuration or alert event type, without modifying the core service. It may seem overengineering at the moment, but I think it will help us in the long run. At the moment, only one strategy is supported, the BasicTransitionStrategy, which moves the states like inactive -> pending -> active -> recovering -> inactive based on a) the status of the alert event and the latest episode status if exist.

Episode Lifecycle Management

The state is calculated as:

Inactive + Breached → Pending: A new alert has started, but must wait in Pending before becoming Active.
Pending + Recoverde → Inactive: The condition cleared before it could become Active.
Active + Recovered → Recovering: An active alert has stopped breaching and enters the recovery phase.
Recovering + Breached → Active: An alert that was recovering has breached again.

The episode ID is preserved across pending, active, and recovering states. A new episode ID is generated only when transitioning from inactive to a non-inactive state (a new episode starts).

Important

Calculating the states based on counts or timeframes will be implemented on the next PR to avoid growing the size of the PR and make reviewing the fundamentals of the director easier. Same for streaming the ESQL results to the director and to the datastream.

flowchart LR
    subgraph Lifecycle["Episode Lifecycle"]
        direction LR
        
        INACTIVE((INACTIVE))
        PENDING((PENDING))
        ACTIVE((ACTIVE))
        RECOVERING((RECOVERING))
        
        INACTIVE -->|"breached<br/>New Episode ID"| PENDING
        PENDING -->|breached| ACTIVE
        ACTIVE -->|recovered| RECOVERING
        RECOVERING -->|recovered| INACTIVE
        
        RECOVERING -->|"breached<br/>"| ACTIVE
        PENDING -->|"recovered<br/>Episode Ends"| INACTIVE
    end

    style INACTIVE fill:#9e9e9e,color:#fff
    style PENDING fill:#ffc107,color:#000
    style ACTIVE fill:#f44336,color:#fff
    style RECOVERING fill:#ff9800,color:#000

Example

Alert events

Row	@timestamp	Status	Episode status	Episode ID
1	10:00	breached	pending	uuid-1
2	10:05	breached	active	uuid-1
3	10:10	recovered	recovering	uuid-1
4	10:15	recovered	recovered	uuid-1
5	10:20	breached	pending	uuid-2

Out of scope

Changed state transitions based on counts or timeframes.
Streaming of ES|QL results

Testing

Create a rule that fires breach events.
Maturation:
- Verify that the alert event documents have the correct episode status, alert event status, and the episode ID on each run.

Recovering is not possible to be tested atm as we need the rule executor to produce these alert events.

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

Unit or functional tests were updated or added to match the most common scenarios

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps][Alerting] Alerting v2: Director#247673

[ResponseOps][Alerting] Alerting v2: Director#247673
cnasikas merged 23 commits into
elastic:alerting_v2from
cnasikas:alerting_v2_director

cnasikas commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

cnasikas commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Strategy Pattern

Episode Lifecycle Management

Example

Out of scope

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cnasikas commented Dec 31, 2025 •

edited

Loading