Skip to content

[RFC] OpenSearch Events Correlation Engine #6779

@sbcd90

Description

@sbcd90

Problem Statement

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0.
OpenSearch includes a data store and search engine where customers can store their business, operational, and security data from a variety of sources & run search queries on them.

Since the various customer infrastructure events, such as security events, observability events etc, spans across multiple indices & data streams, a strong correlation across these indices (or data streams) helps customers to identify patterns and dive into the relationship of events occurring across different systems in their infrastructure.

Definitions

Events Correlation Engine

Correlation Engine is an Events Knowledge Graph which can be used to identify and store connected events data spanning across multiple indices or data streams. Also, it helps generate insights by correlating the recent/historical data based on time windows provided by the client .

The Events Correlation Engine provides an approach to help customers correlate events across log sources by allowing customers to define their own Correlation Rules exactly once, while then generating correlations between events from different log sources automatically.

Dimensions of Correlation

Time Window

Time Window is the most basic Dimension of Correlation that can be defined by the user. Correlation Engine would show all possible correlations across all indices within the specified time window if no other dimension is provided.

Source Events Indices/DataStreams

While Time Window is an important dimension of correlation, users also need to provide source events indices(or datastreams) on which Correlation rules can be defined which acts as an additional dimension of correlation.

Query Language for Correlation Rules

The most granular level of correlation supported by the Correlation Engine is using correlation rules or queries over the source events indices or datastreams. These rules allow the Correlation Engine to eliminate false positives & present to the user a list of highly accurate correlated search results.

One of the popular choices for defining Correlation Rules is Event Query Language(EQL) from Elasticsearch. EQL supports ECS today.

Here is a sample EQL based Correlation Query.

{
  "query": """
      [network where src_addr == "4.5.6.7" and severity_id = -1]
      [ad_ldap where ResultType == 50126]
      [windows where host.hostname == "EC2AMAZ*"]
      [others_application where StatusCode == 403]
      [s3 where aws.cloudtrail.eventName="ReplicateObject"]
  """
}

High-Level Design

There are 2 high level components in the design of Events Correlation Engine .

Correlation Query Service

This sub-system manages the lifecycle of the Correlation Rules created by the users. Users can create, update, read or delete rules using the REST apis provided by this layer.

The language for defining Correlation Rules is still not finalized. EQL is one of the examples for defining Correlation Rules.

Correlation Service

The internals of the Correlation Engine is composed of 4 major components.

  • HNSW Graph based vector storage - this is HNSW Graph based storage used to store all event vectors & query them at the vector level.
  • Insertion Handler - the most important piece of the Correlation Engine is its insertion algorithm. In this layer, events are converted to k-dimensional vectors & are stored in the vector storage layer mentioned above along with their correlations.
  • Search Handler - the second most important piece of the Correlation Engine allows user to specify a particular event, & then converts it to a k-dimensional vector & then uses it to query its neighboring eventswhich are actually its correlated events within a time window.
  • Join Handler - the Join task determines immediate neighbors of a particular event, given the correlation rules defined by the user for the log indices(or datastreams) they wish to correlate .

Screenshot 2023-03-21 at 12 38 19 PM

Use Cases

Security Analytics Correlation Engine for correlating security events

Security Analytics is an open-source solution for security operations in OpenSearch. Security Analytics’ threat detection engine converts the detection rules into executable OpenSearch queries which are then matched against the logs or events ingested by the user to generate findings. The trigger condition filters are further applied on the findings to generate alerts.

Today in Security Analytics, the generated findings belong to individual log types & there is no way to automatically correlate between them. Users would manually need to browse through the findings generated for individual log categories & then need to identify patterns manually.

The Security Analytics Correlation Engine provides an approach to solve this issue by allowing the customers to define the correlation metadata across log categories exactly once & then generating correlations between findings from different log categories automatically.

Here is link to RFC

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCIssues requesting major changesRoadmap:Security AnalyticsProject-wide roadmap labelenhancementEnhancement or improvement to existing feature or request

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions