Skip to content

Comments

[Security Solution][Detection Engine] adds async ES|QL query#216667

Merged
vitaliidm merged 30 commits intoelastic:mainfrom
vitaliidm:de_9_1/async_esql
Apr 17, 2025
Merged

[Security Solution][Detection Engine] adds async ES|QL query#216667
vitaliidm merged 30 commits intoelastic:mainfrom
vitaliidm:de_9_1/async_esql

Conversation

@vitaliidm
Copy link
Contributor

@vitaliidm vitaliidm commented Apr 1, 2025

Summary

Introducing async query would allow to overcome ES request timeout for long running rules and queries.

Timeout for ES request is defined in alerting framework and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded here).

If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config

xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'

So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m

By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts

More details in internal issue

@vitaliidm vitaliidm self-assigned this Apr 1, 2025
@vitaliidm vitaliidm added release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Engine Security Solution Detection Engine Area labels Apr 1, 2025
@vitaliidm vitaliidm marked this pull request as ready for review April 3, 2025 09:43
@vitaliidm vitaliidm requested review from a team as code owners April 3, 2025 09:43
@vitaliidm vitaliidm requested a review from rylnd April 3, 2025 09:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-engine (Team:Detection Engine)

@vitaliidm vitaliidm added v9.1.0 v8.19.0 backport:version Backport to applied version labels labels Apr 3, 2025
@yctercero yctercero requested a review from marshallmain April 3, 2025 18:02
description: i18n.ESQL_SEARCH_REQUEST_DESCRIPTION,
});
const asyncSearchStarted = performance.now();
const asyncEsqlResponse = await esClient.transport.request<AsyncEsqlResponse>({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use transport.request instead of .esql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed esql method returns not correct typings. So decided to stick to transport.request as in previous implmentation

requestBody: Record<string, unknown>;
requestBody: {
query: string;
filter: QueryDslQueryContainer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_completion_timeout and keep_alive should probably be params here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added them, since they were not used.
Added them now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned out, these values applicable to body only, so I removed them from this type

if (isCancelled) {
throw new Error('Rule execution cancelled due to timeout');
}
await new Promise((resolve) => setTimeout(resolve, pollInterval));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to wait at the beginning of the loop so we wait in between the initial response and the first poll

filter: requestFilter,
},
},
wait_for_completion_timeout: '4m', // hard limit request timeout is 5m set by ES proxy and alerting framework. So, we should be fine to wait 4m for async query completion. If rule execution is shorter than 4m and query was not completed, it will be aborted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting wait_for_completion_timeout this high makes this effectively a synchronous query. Should this be keep_alive instead? Some keep_alive value longer than the rule timeout would be sufficient. I think we want wait_for_completion_timeout to be some number of seconds, like 5 or 10 seconds.

Setting keep_alive will help ensure that the results are deleted quickly even if the cleanup DELETE request fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was not to start polling until rule timeouts. If rule timeout is greater then 5m, when we can hit ES requests timeout limitation, we would stop waiting for query to complete(4m) and start polling.

Setting keep_alive will help ensure that the results are deleted quickly even if the cleanup DELETE request fails.

We don't have access to rule timeout within executor - only to shouldStopExecution. Which can't be used to set up keep_alive beforehand. I am not sure we want to expose this to executor and in future to rely on that value while running rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. Is 4m the right value for both ECH and serverless? The default rule timeout is different for serverless, right? Is the default connection timeout different?

I think it would be worth following up on this PR to see if we can get access to the rule timeout or maybe have the framework wrapper inject a keep_alive value depending on the rule timeout. There could be some use case eventually for async requests initiated by one rule execution and retrieved by a later one, so maybe not every async request needs to have keep_alive == rule timeout, but I think it's the typical scenario for us. It'll just help with system resilience to set it since there could be hundreds of ESQL rules running every few minutes and we'll never need those results for 5 days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. Is 4m the right value for both ECH and serverless? The default rule timeout is different for serverless, right? Is the default connection timeout different?

Default rule timeout for serverless is 1m. If async query request runs longer that this, it would be aborted and query cancelled/deleted in ES.
It's 5m for ECH. If query takes longer than this to finish, it would be cancelled and deleted as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marshallmain , issue to expose rule timeout to rule executor: #218072

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we'll just have to be aware of any timeout discrepancies in the future. If the rule timeout increases in serverless (which is a desirable change for us) but the connection timeout values are different (e.g. lower in serverless), we could start seeing environment specific failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think errors would be in the same category - timeout related. Can be just for different requests(query or poll) and different times. But, we already have different timeout for different envs

@vitaliidm vitaliidm enabled auto-merge (squash) April 15, 2025 16:25
@vitaliidm vitaliidm disabled auto-merge April 15, 2025 17:17
Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed previously, but gave it another pass since I was still listed as blocking merge.

LGTM.

import { logEsqlRequest } from '../utils/logged_requests';
import * as i18n from '../translations';

const logDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would this more accurately be something like

Suggested change
const logDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => {
const setLatestRequestDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => {

Since this function seems to only permute loggedRequests, and doesn't actually "log" anything itself?

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #13 / TemplatesList renders template details correctly

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 852 853 +1
Unknown metric groups

API count

id before after diff
alerting 886 887 +1

History

cc @vitaliidm

@vitaliidm vitaliidm merged commit 3d7aac1 into elastic:main Apr 17, 2025
11 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.18, 8.19, 9.0

https://github.com/elastic/kibana/actions/runs/14517920458

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Apr 17, 2025
…#216667)

## Summary

- addresses elastic/security-team#11116 (list
item 2)

Introducing async query would allow to overcome ES request timeout for
long running rules and queries.

Timeout for ES request is [defined in alerting
framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)
and is smaller value out of rule execution timeout or default ES request
timeout(which is 5m and hardcoded
[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).

If ES|QL rule performs a single long-running ES query, it can time out
after 5m due to this ES request timeout. This value can't be changed,
unlike rule execution timeout. It can be overwritten in Kibana config

```
xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'
```
So, we can encounter situations when rule fails execution after 5m due
to ES request timeout, despite a fact it configured with longer timeout
of 15m

By using async query, we can overcome this limitation and can poll async
query results until it completes or rule timeouts

More details in internal
[issue](elastic/sdh-security-team#1224)

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
(cherry picked from commit 3d7aac1)
@kibanamachine
Copy link
Contributor

💔 Some backports could not be created

Status Branch Result
8.18 Backport failed because of merge conflicts
8.19
9.0 Backport failed because of merge conflicts

Note: Successful backport PRs will be merged automatically after passing CI.

Manual backport

To create the backport manually run:

node scripts/backport --pr 216667

Questions ?

Please refer to the Backport tool documentation

vitaliidm added a commit to vitaliidm/kibana that referenced this pull request Apr 17, 2025
…#216667)

## Summary

- addresses elastic/security-team#11116 (list
item 2)

Introducing async query would allow to overcome ES request timeout for
long running rules and queries.

Timeout for ES request is [defined in alerting
framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)
and is smaller value out of rule execution timeout or default ES request
timeout(which is 5m and hardcoded
[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).

If ES|QL rule performs a single long-running ES query, it can time out
after 5m due to this ES request timeout. This value can't be changed,
unlike rule execution timeout. It can be overwritten in Kibana config

```
xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'
```
So, we can encounter situations when rule fails execution after 5m due
to ES request timeout, despite a fact it configured with longer timeout
of 15m

By using async query, we can overcome this limitation and can poll async
query results until it completes or rule timeouts

More details in internal
[issue](elastic/sdh-security-team#1224)

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
(cherry picked from commit 3d7aac1)

# Conflicts:
#	x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_preview/api/preview_rules/route.ts
#	x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_types/esql/esql.ts
@vitaliidm
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.0
8.18

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

vitaliidm added a commit to vitaliidm/kibana that referenced this pull request Apr 17, 2025
…#216667)

## Summary

- addresses elastic/security-team#11116 (list
item 2)

Introducing async query would allow to overcome ES request timeout for
long running rules and queries.

Timeout for ES request is [defined in alerting
framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)
and is smaller value out of rule execution timeout or default ES request
timeout(which is 5m and hardcoded
[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).

If ES|QL rule performs a single long-running ES query, it can time out
after 5m due to this ES request timeout. This value can't be changed,
unlike rule execution timeout. It can be overwritten in Kibana config

```
xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'
```
So, we can encounter situations when rule fails execution after 5m due
to ES request timeout, despite a fact it configured with longer timeout
of 15m

By using async query, we can overcome this limitation and can poll async
query results until it completes or rule timeouts

More details in internal
[issue](elastic/sdh-security-team#1224)

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
(cherry picked from commit 3d7aac1)

# Conflicts:
#	x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_preview/api/preview_rules/route.ts
#	x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_types/esql/esql.ts
kibanamachine added a commit that referenced this pull request Apr 17, 2025
…216667) (#218567)

# Backport

This will backport the following commits from `main` to `8.19`:
- [[Security Solution][Detection Engine] adds async ES|QL query
(#216667)](#216667)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Vitalii
Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:
SecuritySolution","Team:Detection
Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security
Solution][Detection Engine] adds async ES|QL
query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.19","8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
vitaliidm added a commit that referenced this pull request Apr 17, 2025
…216667) (#218583)

# Backport

This will backport the following commits from `main` to `9.0`:
- [[Security Solution][Detection Engine] adds async ES|QL query
(#216667)](#216667)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Vitalii
Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:
SecuritySolution","Team:Detection
Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security
Solution][Detection Engine] adds async ES|QL
query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/218567","number":218567,"state":"OPEN"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
vitaliidm added a commit that referenced this pull request Apr 17, 2025
…216667) (#218585)

# Backport

This will backport the following commits from `main` to `8.18`:
- [[Security Solution][Detection Engine] adds async ES|QL query
(#216667)](#216667)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Vitalii
Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:
SecuritySolution","Team:Detection
Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security
Solution][Detection Engine] adds async ES|QL
query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security
Solution][Detection Engine] adds async ES|QL query (#216667)\n\n##
Summary\n\n- addresses
elastic/security-team#11116 (list\nitem
2)\n\nIntroducing async query would allow to overcome ES request timeout
for\nlong running rules and queries.\n\nTimeout for ES request is
[defined in
alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand
is smaller value out of rule execution timeout or default ES
request\ntimeout(which is 5m and
hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf
ES|QL rule performs a single long-running ES query, it can time
out\nafter 5m due to this ES request timeout. This value can't be
changed,\nunlike rule execution timeout. It can be overwritten in Kibana
config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n
ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we
can encounter situations when rule fails execution after 5m due\nto ES
request timeout, despite a fact it configured with longer timeout\nof
15m\n\nBy using async query, we can overcome this limitation and can
poll async\nquery results until it completes or rule timeouts\n\nMore
details in
internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by:
Ryland Herrick
<ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/218567","number":218567,"state":"OPEN"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
davismcphee pushed a commit to davismcphee/kibana that referenced this pull request Apr 22, 2025
…#216667)

## Summary

- addresses elastic/security-team#11116 (list
item 2)

Introducing async query would allow to overcome ES request timeout for
long running rules and queries.

Timeout for ES request is [defined in alerting
framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)
and is smaller value out of rule execution timeout or default ES request
timeout(which is 5m and hardcoded
[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).

If ES|QL rule performs a single long-running ES query, it can time out
after 5m due to this ES request timeout. This value can't be changed,
unlike rule execution timeout. It can be overwritten in Kibana config

```
xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'
```
So, we can encounter situations when rule fails execution after 5m due
to ES request timeout, despite a fact it configured with longer timeout
of 15m

By using async query, we can overcome this limitation and can poll async
query results until it completes or rule timeouts

More details in internal
[issue](elastic/sdh-security-team#1224)

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
akowalska622 pushed a commit to akowalska622/kibana that referenced this pull request May 29, 2025
…#216667)

## Summary

- addresses elastic/security-team#11116 (list
item 2)

Introducing async query would allow to overcome ES request timeout for
long running rules and queries.

Timeout for ES request is [defined in alerting
framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)
and is smaller value out of rule execution timeout or default ES request
timeout(which is 5m and hardcoded
[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).

If ES|QL rule performs a single long-running ES query, it can time out
after 5m due to this ES request timeout. This value can't be changed,
unlike rule execution timeout. It can be overwritten in Kibana config

```
xpack.alerting.rules.run:
  timeout: '10m'
  ruleTypeOverrides:
    - id:  'siem.esqlRule'
      timeout: '15m'
```
So, we can encounter situations when rule fails execution after 5m due
to ES request timeout, despite a fact it configured with longer timeout
of 15m

By using async query, we can overcome this limitation and can poll async
query results until it completes or rule timeouts

More details in internal
[issue](elastic/sdh-security-team#1224)

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Detection Engine Security Solution Detection Engine Area Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.18.1 v8.19.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants