Skip to content

[Security Solution][Siem Migrations] POC: SPL rule translation to ES|QL#196651

Closed
semd wants to merge 21 commits intoelastic:mainfrom
semd:10653_spl-esql-translation-api
Closed

[Security Solution][Siem Migrations] POC: SPL rule translation to ES|QL#196651
semd wants to merge 21 commits intoelastic:mainfrom
semd:10653_spl-esql-translation-api

Conversation

@semd
Copy link
Contributor

@semd semd commented Oct 17, 2024

@elasticmachine
Copy link
Contributor

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!

semd added a commit that referenced this pull request Nov 6, 2024
…97997)

## Summary

It implements the background task to execute the rule migrations and the
API to manage them. It also contains a basic implementation of the
langGraph agent workflow that will perform the migration using
generative AI.

> [!NOTE]  
> This feature needs `siemMigrationsEnabled` experimental flag enabled
to work. Otherwise, the new API routes won't be registered, and the
`SiemRuleMigrationsService` _setup_ won't be called. So no migration
task code can be reached, and no data stream/template will be installed
to ES.

### The rule migration task implementation:

- Retrieve a batch of N rule migration documents (50 rules initially, we
may change that later) with `status: pending`.
- Update those documents to `status: processing`.
- Execute the migration for each of the N migrations in parallel.
- If there is any error update the document with `status: error`.
- For each rule migration that finishes we set the result to the
storage, and also update `status: finished`.
- When all the batch of rules is finished the task will check if there
are still migration documents with `status: pending` if so it will
process the next batch with a delay (10 seconds initially, we may change
that later).
- If the task is stopped (via API call or server shut-down), we do a
bulk update for all the `status: processing` documents back to `status:
pending`.

### Task API

- `POST /internal/siem_migrations/rules` (implemented
[here](elastic/security-team#10654)) ->
Creates the migration on the backend and stores the original rules. It
returns the `migration_id`
- `GET /internal/siem_migrations/rules/stats` -> Retrieves the stats for
all the existing migrations, aggregated by `migration_id`.
- `GET /internal/siem_migrations/rules/{migration_id}` -> Retrieves all
the migration rule documents of a specific migration.
- `PUT /internal/siem_migrations/rules/{migration_id}/start` -> Starts
the background task for a specific migration.
- `GET /internal/siem_migrations/rules/{migration_id}/stats` ->
Retrieves the stats of a specific migration task. The UI will do polling
to this endpoint.
- `PUT /internal/siem_migrations/rules/{migration_id}/stop` -> Stops the
execution of a specific migration running task. When a migration is
stopped, the executing task is aborted and all the rules in the batch
being processed are moved back to pending, all finished rules will
remain stored. When the Kibana server shuts down all the running
migrations are stopped automatically. To resume the migration we can
call `{migration_id}/start` again and it will take it from the same
rules batch it was left.

#### Stats (UI polling) response example:
```
{
    "status": "running",
    "rules": {
        "total": 34,
        "finished": 20,
        "pending": 4,
        "processing": 10,
        "failed": 0
    },
    "last_updated_at": "2024-10-29T15:04:49.618Z"
}
```

### LLM agent Graph

The initial implementation of the agent graph that is executed per rule:

![agent graph
diagram](https://github.com/user-attachments/assets/9228350c-a469-449b-a58a-0b452bb805aa)

The first node tries to match the original rule with an Elastic prebuilt
rule. If it does not succeed, the second node will try to translate the
query as a custom rule using the ES|QL knowledge base, this composes
previous PoCs:
- #193900
- #196651



## Testing locally

Enable the flag
```
xpack.securitySolution.enableExperimental: ['siemMigrationsEnabled']
```

cURL request examples:

<details>
  <summary>Rules migration `create` POST request</summary>

```
curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '[
    {
        "id": "f8c325ea-506e-4105-8ccf-da1492e90115",
        "vendor": "splunk",
        "title": "Linux Auditd Add User Account Type",
        "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.",
        "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *",
        "query_language":"spl",
        "mitre_attack_ids": [
            "T1136"
        ]
    },
    {
        "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90",
        "vendor": "splunk",
        "title": "Linux Auditd Change File Owner To Root",
        "description": "The following analytic detects the use of the '\''chown'\'' command to change a file owner to '\''root'\'' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.",
        "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`",
        "query_language": "spl",
        "mitre_attack_ids": [
            "T1222"
        ]
    }
]'
```
</details>

<details>
  <summary>Rules migration `start` task request</summary>

- Assuming the connector `azureOpenAiGPT4o` is already created in the
local environment.
- Using the {{`migration_id`}} from the first POST request response

```
curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/start' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '{
    "connectorId": "azureOpenAiGPT4o"
}'
```
</details>

<details>
  <summary>Rules migration `stop` task request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/stop' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' 
```
</details>


<details>
  <summary>Rules migration task `stats` request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/stats' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' 
```
</details>

<details>
  <summary>Rules migration rules documents request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' 
```
</details>

<details>
  <summary>Rules migration all stats request</summary>

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/stats' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' 
```
</details>

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
semd added a commit to semd/kibana that referenced this pull request Nov 6, 2024
…astic#197997)

## Summary

It implements the background task to execute the rule migrations and the
API to manage them. It also contains a basic implementation of the
langGraph agent workflow that will perform the migration using
generative AI.

> [!NOTE]
> This feature needs `siemMigrationsEnabled` experimental flag enabled
to work. Otherwise, the new API routes won't be registered, and the
`SiemRuleMigrationsService` _setup_ won't be called. So no migration
task code can be reached, and no data stream/template will be installed
to ES.

### The rule migration task implementation:

- Retrieve a batch of N rule migration documents (50 rules initially, we
may change that later) with `status: pending`.
- Update those documents to `status: processing`.
- Execute the migration for each of the N migrations in parallel.
- If there is any error update the document with `status: error`.
- For each rule migration that finishes we set the result to the
storage, and also update `status: finished`.
- When all the batch of rules is finished the task will check if there
are still migration documents with `status: pending` if so it will
process the next batch with a delay (10 seconds initially, we may change
that later).
- If the task is stopped (via API call or server shut-down), we do a
bulk update for all the `status: processing` documents back to `status:
pending`.

### Task API

- `POST /internal/siem_migrations/rules` (implemented
[here](elastic/security-team#10654)) ->
Creates the migration on the backend and stores the original rules. It
returns the `migration_id`
- `GET /internal/siem_migrations/rules/stats` -> Retrieves the stats for
all the existing migrations, aggregated by `migration_id`.
- `GET /internal/siem_migrations/rules/{migration_id}` -> Retrieves all
the migration rule documents of a specific migration.
- `PUT /internal/siem_migrations/rules/{migration_id}/start` -> Starts
the background task for a specific migration.
- `GET /internal/siem_migrations/rules/{migration_id}/stats` ->
Retrieves the stats of a specific migration task. The UI will do polling
to this endpoint.
- `PUT /internal/siem_migrations/rules/{migration_id}/stop` -> Stops the
execution of a specific migration running task. When a migration is
stopped, the executing task is aborted and all the rules in the batch
being processed are moved back to pending, all finished rules will
remain stored. When the Kibana server shuts down all the running
migrations are stopped automatically. To resume the migration we can
call `{migration_id}/start` again and it will take it from the same
rules batch it was left.

#### Stats (UI polling) response example:
```
{
    "status": "running",
    "rules": {
        "total": 34,
        "finished": 20,
        "pending": 4,
        "processing": 10,
        "failed": 0
    },
    "last_updated_at": "2024-10-29T15:04:49.618Z"
}
```

### LLM agent Graph

The initial implementation of the agent graph that is executed per rule:

![agent graph
diagram](https://github.com/user-attachments/assets/9228350c-a469-449b-a58a-0b452bb805aa)

The first node tries to match the original rule with an Elastic prebuilt
rule. If it does not succeed, the second node will try to translate the
query as a custom rule using the ES|QL knowledge base, this composes
previous PoCs:
- elastic#193900
- elastic#196651

## Testing locally

Enable the flag
```
xpack.securitySolution.enableExperimental: ['siemMigrationsEnabled']
```

cURL request examples:

<details>
  <summary>Rules migration `create` POST request</summary>

```
curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '[
    {
        "id": "f8c325ea-506e-4105-8ccf-da1492e90115",
        "vendor": "splunk",
        "title": "Linux Auditd Add User Account Type",
        "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.",
        "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *",
        "query_language":"spl",
        "mitre_attack_ids": [
            "T1136"
        ]
    },
    {
        "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90",
        "vendor": "splunk",
        "title": "Linux Auditd Change File Owner To Root",
        "description": "The following analytic detects the use of the '\''chown'\'' command to change a file owner to '\''root'\'' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.",
        "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`",
        "query_language": "spl",
        "mitre_attack_ids": [
            "T1222"
        ]
    }
]'
```
</details>

<details>
  <summary>Rules migration `start` task request</summary>

- Assuming the connector `azureOpenAiGPT4o` is already created in the
local environment.
- Using the {{`migration_id`}} from the first POST request response

```
curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/start' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '{
    "connectorId": "azureOpenAiGPT4o"
}'
```
</details>

<details>
  <summary>Rules migration `stop` task request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/stop' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1'
```
</details>

<details>
  <summary>Rules migration task `stats` request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/stats' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1'
```
</details>

<details>
  <summary>Rules migration rules documents request</summary>

- Using the {{`migration_id`}} from the first POST request response.

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1'
```
</details>

<details>
  <summary>Rules migration all stats request</summary>

```
curl --location --request GET 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/stats' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1'
```
</details>

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit cc66320)

# Conflicts:
#	x-pack/plugins/security_solution/common/api/quickstart_client.gen.ts
#	x-pack/plugins/security_solution/server/types.ts
#	x-pack/test/api_integration/services/security_solution_api.gen.ts
@semd semd closed this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants